One million (small web) screenshots

https://news.ycombinator.com/rss Hits: 16
Summary

BackgroundLast month I came across onemillionscreenshots.com and was pleasantly surprised at how well it worked as a tool for discovery. We all know the adage about judging book covers, but here, …it just kinda works. Skip over the sites with bright flashy colors begging for attention and instead, seek out the negative space in between.The one nitpick I have though is in how they sourced the websites. They used the most popular websites from Common Crawl which is fine, but not really what I’m interested in…There are of course exceptions, but the relationship between popularity and quality is loose. McDonald’s isn’t popular because it serves the best cheeseburger, it’s popular because it serves a cheeseburger that meets the minimum level of satisfaction for the maximum number of people. It’s a profit maximizing local minima on the cheeseburger landscape.This isn’t limited to just food either, the NYT Best Sellers list, Spotify Top 50, and Amazon review volume are other good examples. For me, what’s “popular” has become a filter for what to avoid. Lucky for us though, there’s a corner of the internet where substance still outweighs click-through rates. A place that’s largely immune to the corrosive influence of monetization. It’s called the small web and it’s a beautiful place.A small web variantThe timing of this couldn’t have been better. I’m currently working on a couple of tools specifically focused on small web discovery/recommendation and happen to already have most of the data required to pull this off. I just needed to take some screenshots, sooo… you’re welcome armchairhacker![full screen version: screenshots.nry.me]Technical detailsBecause I plan on discussing how I gathered the domains in the near future, I’ll skip it for now (it’s pretty interesting). Suffice it to say though, once the domains are available, capturing the screenshots is trivial. And once those are ready, we have a fairly well worn path to follow:generate visual embeddingsdim...

First seen: 2025-12-27 01:53

Last seen: 2025-12-27 16:55