Bruce Li introduces his The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting thus: This report delves into the mechanics of the Internet Archive with the precision of a teardown. We will strip back the chassis to examine the custom-built PetaBox servers that heat the building without air conditioning. We will trace the evolution of the web crawlers—from the early tape-based dumps of Alexa Internet to the sophisticated browser-based bots of 2025. We will analyze the financial ledger of this non-profit giant, exploring how it survives on a budget that is a rounding error for its Silicon Valley neighbors. And finally, we will look to the future, where the "Decentralized Web" (DWeb) promises to fragment the Archive into a million pieces to ensure it can never be destroyed. It is long, detailed, comprehensive and well worth reading in full. Below the fold I comment on the part about storage. This picture shows Brewster Kahle with the Internet Archive's very first storage, tape drives holding crawls from Alexa Internet, the Web traffic analysis company founded by Brewster and Bruce Gilliat in 1996 and, three years later, acquired by Amazon for $250M. The first custom-designed Internet Archive storage system was version 1 of the PetaBox, a vividly red-painted 1U system introduced in 2004. This picture shows a rack of them. I'm familiar with them because the LOCKSS Program used them for some years. They were great! Li explains: The first PetaBox rack, operational in June 2004, was a revelation in storage density. It held 100 terabytes (TB) of data—a massive sum at the time—while consuming only about 6 kilowatts of power. To put that in perspective, in 2003, the entire Wayback Machine was growing at a rate of just 12 terabytes per month. The LOCKSS Program eventually transitioned to 4U systems similar to those then in use at Backblaze. Around the same time the Archive also moved to 4U systems for their PetaBox. As disk technology progressed the Arc...
First seen: 2026-01-24 04:50
Last seen: 2026-01-24 05:50