Illuminating the processor core with LLVM-mca

https://news.ycombinator.com/rss Hits: 10
Summary

Performance Tip of the Week #99: Illuminating the processor core with llvm-mca Originally posted as Fast TotW #99 on September 29, 2025 By Chris Kennelly Updated 2025-10-07 Quicklink: abseil.io/fast/99 The RISC versus CISC debate ended in a draw: Modern processors decompose instructions into micro-ops handled by backend execution units. Understanding how instructions are executed by these units can give us insights on optimizing key functions that are backend bound. In this episode, we walk through using llvm-mca to analyze functions and identify performance insights from its simulation. Preliminaries: Varint optimization llvm-mca, short for Machine Code Analyzer, is a tool within LLVM. It uses the same datasets that the compiler uses for making instruction scheduling decisions. This ensures that improvements made to compiler optimizations automatically flow towards keeping llvm-mca representative. The flip side is that the tool is only as good as LLVM鈥檚 internal modeling of processor designs, so certain quirks of individual microarchitecture generations might be omitted. It also models the processor behavior statically, so cache misses, branch mispredictions, and other dynamic properties aren鈥檛 considered. Consider Protobuf鈥檚 VarintSize64 method: size_t CodedOutputStream::VarintSize64(uint64_t value) { #if PROTOBUF_CODED_STREAM_H_PREFER_BSR // Explicit OR 0x1 to avoid calling absl::countl_zero(0), which // requires a branch to check for on platforms without a clz instruction. uint32_t log2value = (std::numeric_limits<uint64_t>::digits - 1) - absl::countl_zero(value | 0x1); return static_cast<size_t>((log2value * 9 + (64 + 9)) / 64); #else uint32_t clz = absl::countl_zero(value); return static_cast<size_t>( ((std::numeric_limits<uint64_t>::digits * 9 + 64) - (clz * 9)) / 64); #endif } This function calculates how many bytes an encoded integer will consume in Protobuf鈥檚 wire format. It first computes the number of bits needed to represent the value by finding the log...

First seen: 2025-12-14 15:55

Last seen: 2025-12-15 00:56