Lies, Damned Lies and Proofs: Formal Methods Are Not Slopless

https://news.ycombinator.com/rss Hits: 6
Summary

We appreciate comments from Christopher Henson, Zeke Medley, Ankit Kumar, and Pete Manolios. This post was initialized by Max’s twitter thread. IntroductionThere's been a lot of chatter recently on HN and elsewhere about how formal verification is the obvious use-case for AI. While we broadly agree, we think much of the discourse is kinda wrong because it incorrectly presumes formal = slopless.[1]Over the years, we have written our fair share of good and bad formal code. In this post, we hope to convince you that formal code can be sloppy, and that this has serious implications for anyone who hopes to bootstrap superintelligence by using formality to reinforce “good” reasoning.A mainstay on the Lean Zulip named Gas Station Manager has written that hallucination-free program synthesis[2]is achievable by vibing software directly in Lean, with the caveat that the agent also needs to prove the software correct. The AI safety case is basically: wouldn’t it be great if a cheap (i.e. O(laptop)) signal could protect you from sycophantic hubris and other classes of mistake, without you having to manually audit all outputs?A fable right outta aesopRecently a computer scientist (who we will spare from naming) was convinced he had solved a major mathematics problem. Lean was happy with it, he reasoned, given that his proof mostly worked, with just a few red squigglies. As seasoned proof engineers, we could have told him that in proof engineering, the growth in further needed edits is superlinear in number-of-red-squigglies (unlike in regular programming). The difference between mistakes in a proof and mistakes in a program is that you cannot fix a broken proof in a way that changes its formal goal (the theorem statement). In contrast, many, if not most changes to traditional software impact its formal spec, for example by adding a side-effect or changing the shape of an output. Therefore proof bugs are 1) harder to fix, and 2) more likely to imply that your goal is fundamentall...

First seen: 2026-01-17 10:23

Last seen: 2026-01-17 15:24