SIMD City: Auto-vectorisation Written by me, proof-read by an LLM. Details at end. It’s time to look at one of the most sophisticated optimisations compilers can do: autovectorisation. Most “big data” style problems boil down to “do this maths to huge arrays”, and the limiting factor isn’t the maths itself, but the feeding of instructions to the CPU, along with the data it needs. To help with this problem, CPU designers came up with SIMD: “Single Instruction, Multiple Data”. One instruction tells the CPU what to do with a whole chunk of data. These chunks could be 2, 4, 8, 16 or similar units of integers or floating point values, all treated individually. Initially the only way to use this capability was to write assembly language directly, but luckily for us, compilers are now able to help. To take advantage of SIMD we need to ensure our data is laid out in a nice long line, like an array, or vector. It’s also much better to store different “fields” of the data in separate arrays. We’re going to start with some integer maths - let’s update an array x to be element-wise the max of x and y: In this -O2-compiled code, we don’t see any vectorisation, just the nice tight loop code we’ve come to expect: .L3: mov edx, DWORD PTR [rsi+rax*4] ; read y[i] cmp edx, DWORD PTR [rdi+rax*4] ; compare with x[i] jle .L2 ; if less, skip next instr mov DWORD PTR [rdi+rax*4], edx ; x[i] = y[i] .L2: add rax, 1 ; ++i cmp rax, 65536 ; are we at 65536? jne .L3 ; keep looping if not To ramp up to the next level, we’ll need to add two flags: first we need to turn up the optimiser to -O3, and then we need to tell the compiler to target a CPU that has the appropriate SIMD instruction with something like -march=skylake. Our inner loop has gotten a little more complex, but spot the cool part: .L4: ; Reads 8 integers into ymm1, y[i..i+7] vmovdqu ymm1, YMMWORD PTR [rsi+rax] ; Compares that with 8 integers x[i..i+7] vpcmpgtd ymm0, ymm1, YMMWORD PTR [rdi+rax] ; Were all 8 values less or equal? vptes...
First seen: 2025-12-27 03:53
Last seen: 2025-12-27 14:55