Current State of Git for Data WorkGit for code is very well known, not so much Git for data. Let's explore the current state of Git for data.How Does Git Work?To understand Git for data, we need to understand how branching with Git works, so we can apply it to data.For example, Git branching holds all metadata and changes of the code from each state. This is handled through hashes. But Git is not made for data because it was designed with code versioning in mind, not large binary files or datasets. As Linus Torvalds himself noted, as the creator of Git, large files in Git were never part of the intended use case. The system's architecture of storing complete snapshots and computing hashes for everything works well for text-based code but becomes unwieldy with large data files. But as data practitioners, we actively want to work with data, with state, which is always harder than just code.Git and Git-like solutions (alternatives are Tangled and Gitea) work. But which of these features do we want for data? And which specific ones do we need more compared to versioning code?Git has concepts like versioning, rollback, diffs, lineage, branch/merge, and sharing. On the data side, which we get into more later, we have concepts such as files vs tables, structured vs unstructured, schema vs data, branching, and time travel.For data, we need a storage layer or a way optimized for large data, schemas, and column types without necessarily duplicating the data. We also need to be able to revert the code and state easily. For example, revert the data pipelines that put production in an incorrect state.If we look at The Struggle of Enterprise Data Integration, we can see that lots of what enterprises struggle with in data is change management and managing complexity. So hopefully, Git for data will help us with this?How Does It Work with Data?Data works differently. We need an open way of sharing and moving data that we can then version, branch off to different versions easily, an...
First seen: 2025-12-14 07:54
Last seen: 2025-12-14 09:54