ANN v3: 200ms p99 query latency over 100B vectors

https://news.ycombinator.com/rss Hits: 3
Summary

ANN v3: 200ms p99 query latency over 100 billion vectorsJanuary 21, 2026鈥athan VanBenschoten (Chief Architect)The pursuit of scale is not vanity. When you take existing systems and optimize them from first principles to achieve a step change in scalability, you can create something entirely new. Nothing has demonstrated that more clearly than the explosion in deep learning over the past decade. The ML community took decades-old ideas and combined them with advancements in hardware, new algorithms, and hyper-specialization to forge something remarkable. Both inspired by the ML community and in service of it, we recently rebuilt vector search in turbopuffer to support scales of up to 100 billion vectors in a single search index. We call this technology Approximate Nearest Neighbor (ANN) Search v3, and it is available now. In this post, I'll dive into the technical details behind how we built for 100 billion vectors. Along the way, we鈥檒l examine turbopuffer鈥檚 architecture, travel up the modern memory hierarchy, zoom into a single CPU core, and then back out to the scale of a distributed cluster. Billion-scale ANN search Let鈥檚 look at the numbers to get a sense of the challenge: 100 billion vectors, 1024 dimensions per vector, 2 bytes per dimension (f16). This is vector search over 200TiB of dense vector data. We want to serve a high rate (> 1k QPS) of ANN queries over this entire dataset, each with a latency target of 200ms or less. With a healthy dose of mechanical sympathy, let鈥檚 consider how our hardware will run this workload and where it will encounter bottlenecks. If one part of the system bottlenecks (disk, network, memory, or CPU), other parts of the system will go underutilized. The key to making the most of the available hardware is to push down bottlenecks and balance resource utilization. turbopuffer鈥檚 architecture is simple and opinionated. This simplicity makes the exercise tractable. turbopuffer鈥檚 query tier is a stateless layer on top of object storage...

First seen: 2026-01-25 13:54

Last seen: 2026-01-25 15:54