Lessons from Hash Table Merging

https://news.ycombinator.com/rss Hits: 11
Summary

Lessons from Hash Table Merging Merging two hash maps seems like a O(N) operation. However, while merging millions of keys, I encountered a massive >10x performance degradation. This post explores why some of the most popular libraries fall into this trap and how to fix it. The source code is available here. I generated 3N random numbers with splitmix64, where N ranges from 7 to 31 million. I put the first N numbers in a hash map and the rest 2N numbers in a second hash map. Then I merged the second map into the first map and recorded timing on M4 Pro. I used a decent hash function copied from a blog post by Chris Wellons: static inline uint64_t hash64(uint64_t x) { x = (x ^ (x>>32)) * 0xd6e8feb86659fd93ULL; x = (x ^ (x>>32)) * 0xd6e8feb86659fd93ULL; return x ^ (x>>32); } I evaluated the following hash table libraries, all based on linear probing. Merging hash tables may be slow The plot on the left shows the time for inserting 2N keys to an empty hash map, and the plot on right shows the time for merging a map with 2N keys into another map with N keys. Notably, Abseil and Boost are >20X slower on hash table merging than hash table creation, while unordered_dense and khashl are not affected (I will come to them later). To understand the cause, suppose we have hash map h0 and h1 of the same size and we merge h1 into h0 with: for (auto const &iter : h1) h0[iter.first] += iter.second; With a typical iterator implementation, we traverse keys in table h1 in the order of their hash codes. Because h0 and h1 use the same hash function (also known as "hasher"), the original bucket position of a key in h1 is identical to the position in h0. With the loop above, we will populate the beginning of h0 first. That part of h0 is almost fully saturated although on average, h0 has not reached the targeted load factor. Such primary clustering leads to the poor performance. All Swiss Table-based implementations, including the Rust libraries in the table, suffer from this issue if a fix...

First seen: 2026-01-08 08:46

Last seen: 2026-01-08 18:48