Would you trust a medical system measured by: which doctor would the average Internet user vote for?No?Yet that malpractice is LMArena.The AI community treats this popular online leaderboard as gospel. Researchers cite it. Companies optimize for it and set it as their North Star. But beneath the sheen of legitimacy lies a broken system that rewards superficiality over accuracy.It's like going to the grocery store and buying tabloids, pretending they're scientific journals.The Problem: Beauty Over SubstanceHere's how LMArena is supposed to work: enter a prompt, evaluate two responses, and mark the best. What actually happens: random Internet users spend two seconds skimming, then click their favorite.They're not reading carefully. They're not fact-checking, or even trying.This creates a perverse reward structure. The easiest way to climb the leaderboard isn't to be smarter; it’s to hack human attention span. We’ve seen over and over again in the data, both from datasets that LMArena has released and the performance of models over time, that the easiest way to boost your ranking is by:Being verbose. Longer responses look more authoritative!Formatting aggressively. Bold headers and bullet points look like polished writing!Vibing. Colorful emojis catch your eye!It doesn't matter if a model completely hallucinates. If it looks impressive – if it has the aesthetics of competence – LMSYS users will vote for it over a correct answer.The Inevitable Result: MadnessWhen you optimize for engagement metrics, you get madness.Earlier this year, Meta tuned a version of Maverick to dominate the leaderboard. If you asked it “what time is it?”, you got:LMArena madnessVoilà : bold text, emojis, and plenty of sycophancy – every trick in the LMArena playbook! – to avoid answering the question it was asked.The Data: 52% WrongIt wasn't just Maverick. We analyzed 500 votes from the leaderboard ourselves. We disagreed with 52% of them, and strongly disagreed with 39%.The leaderboard optimizes...
First seen: 2026-01-07 21:45
Last seen: 2026-01-08 02:46