Alphaproof
Notes from Lecture 8 of this Open MOOC offered by Berkeley RDI
-–
A brief history of formalisation in mathematics What constitutes a mathematical proof evolved from words to symbols, this added -
- Rigor and Clarity
- Efficiency and Communication
- Abstraction and Generalisation
- Unification
- Created new fields
Computer formalisation of Mathematics
- Correctness concerns disappear - perfect verification (lean tells that it is not correct)
- Use software stacks - collaboration (authors can trust the proof instead of performing manual verification), gamified “levels” of theorems
- Lean aims to build Mathlib - aims to be general and unified
Reinforcement Learning
- The “Zero” philosophy - If an agent can learn to master an environment tabula rase (without any information) then you demonstrably have a system that is discovering and learning new knowledge by itself
- It also means that this algorithm is general and should apply to other domains
- A huge apart of science is to explore areas that are not well understood and generate and discover new interesting truths.
- AlphaGoZero and AlphaZero
- learned by simply playing games against itself, starting from complete random play
- rediscovered human patterns in days, and ultimately discarded some of it in preference of new discoveries
- AlphaTensor - what is the most efficient way to multiply two matrices together
- Naive - n^3
- 2 by 2 can be done in 7 moves at best
- alphatensor discovered the most efficient, novel and provably correct methods for 4 by 4 and 5 by 5 and others, which were previously undiscovered. Also can access huge action spaces (10^30)
- What made these systems work?
- Scaled trial and error
- Grounded feedback signal
- Search - sampling highest probability action
- Curriculum - self-play for AlphaGo gives the right opponent
- Lean allows the “Zero” principles to be applied to mathematics
- AlphaProof
- trial and error scaling
- insilico environment for math
- perfect feedback signal for proving
- Finding problems to solve
- human defined
- alphaproof defined
- around human mathematics (auto-formalisation, test-time RL)
- augment human mathematics
- Should training be done in natural language (informal) or formal maths
- informal - large amount of data, unverifiable
- formal - small amount of data, verifiable - bet is that this will be rewarded in the long term
- IMO - hard problems -
- Geometry
- mathlib is sparse in 2D Euclidean geometry
- AlphaGeometry and AlphaProof join forces
- Not all answers are “proofs”
- easy mode - correct answer given by oracle, AI proves it
- hard mode - AI determines the answer, and proves it - generate candidate answers using Gemini
- Input Formalisation - input is given in natural language to participants, but alphaproof receives a formalised variant by lean experts
- Protocol
- Lean experts formalise
- generate O(100) answer candidates with Gemini
- Filter the easily disprovable ones
- Run Test-time RL
- Challenges
- gaps in mathlib - geometry, even p1 (that it solved)
- human data is in natural language
- Combinatorics problems are difficult to formalise and solve
- Methods
- Formaliser Model - output a lean formalisation for an input problem - Auto formalisation - generate lean “variants” for it (100s that are close to the human problem)
- Prover Model - Output N solving tactics, for an input lean state - trained on Mathlib - 300k lines of proof - learn a good prior of actions to take (supervised training) - but mathlib does not have IMO style problems
- Train Prover model by RL - to solve for IMO problems and also generalise
- Fir each formal problem
- generate experience of (dis)proving by searching over lean steps
- Use Lean to verify proofs
- Reinforce the prover network with each success
- Fir each formal problem
- Prover Model + AlphaZero Search
- Search over actions (one line in the proof)
- Compute new lean state after every action/tactic -lean updates the state
- Explore high prior, high value paths but also explore low visited path
- Test-Time RL-
- For each problem
- Generate variants of the problem
- Run RL exactly as previous step
- For each problem
- Geometry
- trial and error scaling
-——–
https://rdi.berkeley.edu/adv-llm-agents/sp25
Written on May 29, 2025
