German StringsStrings are conceptually very simple: It’s essentially just a sequence of characters, right? Why, then, does every programming language have their own slightly different string implementation? It turns out that there is a lot more to a string than “just a sequence of characters”.We’re no different and built our own custom string type that is highly optimized for data processing. Even though we didn’t expect it when we first wrote about it in our inaugural Umbra research paper, a lot of new systems adopted our format. They are now implemented in DuckDB, Apache Arrow, Polars, and Facebook Velox.In this blog post, we’d like to tell you more about the advantages of our so-called “German Strings” and the tradeoffs we made.But first, let’s take a step back and take a look at how strings are commonly implemented.How C does itIn C, strings are just a sequence of bytes with the vague promise that a \0 byte will terminate the string at some point.This is a very simple model conceptually, but very cumbersome in practice:What if your string is not terminated? If you’re not careful, you can read beyond the intended end of the string, a huge security problem!Just calculating the length of the string forces you to iterate over the whole thing.What if we want to extend the string? We have to allocate new memory, move it there and free the old location all by ourselves.How C++ does itC++ exposes much nicer strings through its standard library. The C++ standard doesn’t enforce a specific implementation, but here’s how libc++ does it:Each string object stores its size (already better than C!), a pointer to the actual payload, and the capacity of the buffer in which the data is stored. You’re free to append to the string “for free” as long as the resulting string is still shorter than the buffer capacity and the string will take care of allocating a larger buffer and freeing the old one when it grows too much: std::strings are mutable.This string implementation also allow...
First seen: 2026-01-06 14:36
Last seen: 2026-01-06 15:37