High-performance C++ hash table using grouped SIMD metadata scanning

https://news.ycombinator.com/rss Hits: 4
Summary

Grouped SIMD Hash Table A high-performance C++ hash table that beats state-of-the-art at scale using grouped SIMD metadata scanning. Results vs ankerl::unordered_dense (Robin Hood, considered SOTA): Size ankerl GroupedSIMD Winner 100k 2.76 ms 3.36 ms ankerl 500k 29.2 ms 34.0 ms ankerl 1M 72.3 ms 71.9 ms GroupedSIMD 2M 157.4 ms 156.3 ms GroupedSIMD Operation breakdown at 1M elements (80% load): Operation ankerl GroupedSIMD Speedup Insert 50.8 ms 70.1 ms 0.72x Lookup Hit 104 ms 61.5 ms 1.69x Lookup Miss 5.4 ms 4.5 ms 1.21x Use this when: Table size > 500k elements, lookup-heavy workloads. Insert overhead (0.72x) is acceptable when lookups dominate. Usage # include " grouped_simd_elastic.hpp " // Create table with capacity for 1M elements GroupedSIMDElastic< uint64_t , uint64_t > table ( 1200000 ); // Insert table.insert(key, value); // Lookup uint64_t * result = table.find(key); if (result) { // Found: *result is the value } // Check existence if (table.contains(key)) { ... } // Subscript operator table[key] = value; How It Works The Problem with SIMD in Hash Tables Traditional quadratic probing accesses scattered memory locations: Probe sequence: h, h+1, h+4, h+9, h+16, h+25... To use SIMD, you'd need to gather from 16 random positions—slower than scalar code. The Solution: Grouped Probing Probe 16 contiguous slots as a group, then jump to the next group: Group 0: [h+0, h+1, ..., h+15] ← SIMD scan (1 load) Group 1: [h+16, h+17, ..., h+31] ← SIMD scan (1 load) Group 2: [h+32, h+33, ..., h+47] ← SIMD scan (1 load) Within each group, SSE2 scans all 16 metadata bytes in ~3 instructions: __m128i metadata = _mm_loadu_si128((__m128i*)&metadata_[base]); __m128i matches = _mm_cmpeq_epi8(metadata, target); uint32_t mask = _mm_movemask_epi8(matches); This is the same insight behind Google's Swiss Tables. Metadata Format Each slot has a 1-byte metadata tag: Bit 7: Occupied flag (1 = occupied, 0 = empty) Bits 0-6: 7-bit hash fragment This filters out 127/128 non-matches before co...

First seen: 2025-12-29 21:01

Last seen: 2025-12-30 00:02