When building systems that need to store and compare data in a sorted order, we often run into challenges with how to represent our data while preserving the ordering we want. Here are some common patterns (and pitfalls) I’ve observed when working with ordered data. For most of this post, we’ll be considering a generic store that needs to store, compare, and retrieve data which is stored as “raw bytes” or “byte strings”. Bytes are typically compared by value, which leads to a “byte-lexicographical” ordering. This is a common comparator method seen in real-world databases, KV-stores, and other data systems. Integers Integers are generally easy to store and compare… right? Well, not always! For starters, fixed-width integers are simple to store but can waste space for small numbers — which are often prevalent, depending on your data. More importantly, if you’re dumping integers directly from memory, you might run into endianness issues when reinterpreting integers as arrays of bytes during comparisons/ordering. A 32-bit integer, 0x12345678 could be stored as: 0x12 0x34 0x56 0x78 in big-endian or “network order” (used in TCP, older architectures) 0x78 0x56 0x34 0x12 in little-endian (used by both x86 & ARM, among others) Comparing byte-by-byte in little endian wouldn’t preserve ordering between two numbers! Alright, big-endian. But we’re still storing 32 bits (or 64, or 128!) for every number, no matter how big the number actually is. Here, we borrow ideas from serialization (“wire”) formats, like Protocol Buffers, Cap’n Proto, or Flatbuffers, and use variable-length integers (varints). Here’s a typical varint implementation, using the protobuf-style encoding where each byte uses its most significant bit as a continuation flag: ValueEncodedFirst Byte0x1000000b1_0001_000 0b1_0_0000_00 0b1_00_0000_0 0b0_000_00000x880x20000b1_0010_000 0b1_0_0000_00 0b0_00_0000_00x90 While 0x100000 > 0x2000, their encoded forms don’t preserve this ordering when compared byte-by-byte! The f...
First seen: 2025-12-17 17:08
Last seen: 2025-12-17 22:09