SnapBench Inspired by Pokémon Snap (1999). VLM pilots a drone through 3D world to locate and identify creatures. Architecture %%{init: {'theme': 'base', 'themeVariables': { 'background': '#ffffff', 'primaryColor': '#ffffff'}}}%% flowchart LR subgraph Controller["**Controller** (Rust)"] C[Orchestration] end subgraph VLM["**VLM** (OpenRouter)"] V[Vision-Language Model] end subgraph Simulation["**Simulation** (Zig/raylib)"] S[Game State] end C -->|"screenshot + prompt"| V C <-->|"cmds + state<br>**UDP:9999**"| S style Controller fill:#8B5A2B,stroke:#5C3A1A,color:#fff style VLM fill:#87CEEB,stroke:#5BA3C6,color:#1a1a1a style Simulation fill:#4A7C23,stroke:#2D5A10,color:#fff style C fill:#B8864A,stroke:#8B5A2B,color:#fff style V fill:#B5E0F7,stroke:#87CEEB,color:#1a1a1a style S fill:#6BA33A,stroke:#4A7C23,color:#fff Loading Overview The simulation generates procedural terrain and spawns creatures (cat, dog, pig, sheep) for the drone to discover. It handles drone physics and collision detection, accepting 8 movement commands plus identify and screenshot . The Rust controller captures frames from the simulation, constructs prompts enriched with position and state data, then parses VLM responses into executable command sequences. The objective: locate and successfully identify 3 creatures, where identify succeeds when the drone is within 5 units of a target. demo_3x.mov Gotta catch 'em all? I gave 7 frontier LLMs a simple task: pilot a drone through a 3D voxel world and find 3 creatures. Only one could do it. Is this a rigorous benchmark? No. However, it's a reasonably fair comparison - same prompt, same seeds, same iteration limits. I'm sure with enough refinement you could coax better results out of each model. But that's kind of the point: out of the box, with zero hand-holding, only one model figured out how to actually fly. Why can't Claude look down? The core differentiator wasn't intelligence - it was altitude control. Creatures sit on the ground. To identify them, y...
First seen: 2026-01-26 13:58
Last seen: 2026-01-26 15:58