I/O is no longer the bottleneck? Nov 27th, 2022 Recently Ben Hoyt published a blog post claiming that contrary to popular belief, I/O is not the bottleneck in typical programming interview problems such as counting word frequencies from a stream. Sequential read speed has come a long way, while CPU speed has stagnated. Sequential reads are indeed incredibly fast. Using the same method as linked in Ben Hoyt's post, I'm getting 1.6 GB/s sequential reads on a cold cache, and 12.8 GB/s on a warm cache (best of five). But it should be possible to count word frequencies at a speed of 1.6 GB/s even on a single thread, right? (For the impatient: code is available on GitHub.) The optimized C implementation Ben Hoyt's blog refers to an earlier post which includes a faster C version of the word frequency counter. I compiled optimized.c with GCC 12, using -O3 -march=native flags, and ran it on the 425MB input file (100 copies of the King James Version of the Bible). The result was surprisingly bad: $ time ./optimized < bible-100.txt > /dev/null real 0m1.525s user 0m1.477s sys 0m0.048s That is only 278 MB/s on warm cache. Vectorization Looking at the code I realized one of the hot loops had many branches, including an early exit, which prevents the compiler from vectorizing: for (; i < num_process; i++) { char c = buf[i]; if (c <= ' ') { break; } if (c >= 'A' && c <= 'Z') { c += ('a' - 'A'); buf[i] = c; } hash *= FNV_PRIME; hash ^= (uint64_t)c; } My initial attempt to improve performance was to move this lowercase logic out of the loop, like so: for (int i = 0; i < BUF_SIZE; ++i) { buf[i] = buf[i] >= 'A' && buf[i] <= 'Z' ? buf[i] - 'A' + 'a' : buf[i]; } This simple change improved performance to 330 MB/s (using clang for better vectorization). Funnily enough, just adding these 3 lines before the loop, without deleting the original code gives comparable speed; strictly more work, but branch prediction does its job. Still, it's about a factor 5 away from cold cache sequential read...
First seen: 2026-01-06 01:28
Last seen: 2026-01-06 17:38