Our infrastructure team has developed a custom automated reinforcement learning system to continuously improve our models for proving Lean theorems. A critical part of this system is our Lean execution framework, which we’ve named the REPL service. The REPL service is responsible for all interactions between our models and actual Lean proofs. This service enables our RL system to scale independently of our GPU capacity, is semantically stateless, and scales to hundreds of thousands of CPUs. All of this achieved while using preemptible instances, which are the least expensive form of computing power on the market. In this article, we'll first provide background on Lean before describing the design constraints of the REPL service. We will then recount the story of our attempts to scale it up, including both our failures and successes. Finally, we'll outline future work and the key takeaways from this project. Let’s get started! Background Lean is a powerful language designed for interactively proving theorems. Here is a partial Lean proof of the fact that if x = 2 or x = − 2 , then x 2 = 4 : theorem my_theorem (x : ℤ) (h : x = 2 ∨ x = -2) : x * x = 4 := by rcases h with ha | hb . rw [ha] rfl . have my_lemma : -2 * -2 = 4 := by sorry rw [hb] ; exact my_lemma At a high level, the proof consists of tactics, such as rfl, exact, rcases, etc., each of which represents a logical “step” in the proof. At each point between tactics in a Lean proof, there is a proof state, which consists of the available variables and any already-proven facts, along with the goals, which contain what still needs to be proved. Each tactic modifies the current set of goals, producing zero or more subgoals. If no subgoals are produced, that branch of the proof is complete. Tactics may take arguments, and can be nested, as shown above. The above proof could be represented as a tree, where the proof states are nodes (blue) and the tactics are edges (magenta): The keyword sorry means the proof is inco...
First seen: 2026-01-13 22:07
Last seen: 2026-01-13 23:07