Mathematical reasoning is a vital aspect of human cognitive abilities, driving progress in scientific discoveries and technological developments. As we strive to develop artificial general intelligence that matches human cognition, equipping AI with advanced mathematical reasoning capabilities is essential. While current AI systems can handle basic math problems, they struggle with the complex reasoning needed for advanced mathematical disciplines like algebra and geometry. However, this might be changing, as Google DeepMind has made significant strides in advancing an AI system’s mathematical reasoning capabilities. This breakthrough is made at the International Mathematical Olympiad (IMO) 2024. Established in 1959, the IMO is the oldest and most prestigious mathematics competition, challenging high school students worldwide with problems in algebra, combinatorics, geometry, and number theory. Each year, teams of young mathematicians compete to solve six very challenging problems. This year, Google DeepMind introduced two AI systems: AlphaProof, which focuses on formal mathematical reasoning, and AlphaGeometry 2, which specializes in solving geometric problems. These AI systems managed to solve four out of six problems, performing at the level of a silver medalist. In this article, we will explore how these systems work to solve mathematical problems.
AlphaProof: Combining AI and Formal Language for Mathematical Theorem Proving
AlphaProof is an AI system designed to prove mathematical statements using the formal language Lean. It integrates Gemini, a pre-trained language model, with AlphaZero, a reinforcement learning algorithm renowned for mastering chess, shogi, and Go.
The Gemini model translates natural language problem statements into formal ones, creating a library of problems with varying difficulty levels. This serves two purposes: converting imprecise natural language into precise formal language for verifying mathematical proofs and using predictive abilities of Gemini to generate a list of possible solutions with formal language precision.
When AlphaProof encounters a problem, it generates potential solutions and searches for proof steps in Lean to verify or disprove them. This is essentially a neuro-symbolic approach, where the neural network, Gemini, translates natural language instructions into the symbolic formal language Lean to prove or disprove the statement. Similar to AlphaZero’s self-play mechanism, where the system learns by playing games against itself, AlphaProof trains itself by attempting to prove mathematical statements. Each proof attempt refines AlphaProof’s language model, with successful proofs reinforcing the model’s capability to tackle more challenging problems.
For the International Mathematical Olympiad (IMO), AlphaProof was trained by proving or disproving millions of problems covering different difficulty levels and mathematical topics. This training continued during the competition, where AlphaProof refined its solutions until it found complete answers to the problems.
AlphaGeometry 2: Integrating LLMs and Symbolic AI for Solving Geometry Problems
AlphaGeometry 2 is the latest iteration of the AlphaGeometry series, designed to tackle geometric problems with enhanced precision and efficiency. Building on the foundation of its predecessor, AlphaGeometry 2 employs a neuro-symbolic approach that merges neural large language models (LLMs) with symbolic AI. This integration combines rule-based logic with the predictive ability of neural networks to identify auxiliary points, essential for solving geometry problems. The LLM in AlphaGeometry predicts new geometric constructs, while the symbolic AI applies formal logic to generate proofs.
When faced with a geometric problem, AlphaGeometry’s LLM evaluates numerous possibilities, predicting constructs crucial for problem-solving. These predictions serve as valuable clues, guiding the symbolic engine toward accurate deductions and advancing closer to a solution. This innovative approach enables AlphaGeometry to address complex geometric challenges that extend beyond conventional scenarios.
One key enhancement in AlphaGeometry 2 is the integration of the Gemini LLM. This model is trained from scratch on significantly more synthetic data than its predecessor. This extensive training equips it to handle more difficult geometry problems, including those involving object movements and equations of angles, ratios, or distances. Additionally, AlphaGeometry 2 features a symbolic engine that operates two orders of magnitude faster, enabling it to explore alternative solutions with unprecedented speed. These advancements make AlphaGeometry 2 a powerful tool for solving intricate geometric problems, setting a new standard in the field.
AlphaProof and AlphaGeometry 2 at IMO
This year at the International Mathematical Olympiad (IMO), participants were tested with six diverse problems: two in algebra, one in number theory, one in geometry, and two in combinatorics. Google researchers translated these problems into formal mathematical language for AlphaProof and AlphaGeometry 2. AlphaProof tackled two algebra problems and one number theory problem, including the most difficult problem of the competition, solved by only five human contestants this year. Meanwhile, AlphaGeometry 2 successfully solved the geometry problem, though it did not crack the two combinatorics challenges
Each problem at the IMO is worth seven points, adding up to a maximum of 42. AlphaProof and AlphaGeometry 2 earned 28 points, achieving perfect scores on the problems they solved. This placed them at the high end of the silver-medal category. The gold-medal threshold this year was 29 points, reached by 58 of the 609 contestants.
Next Leap: Natural Language for Math Challenges
AlphaProof and AlphaGeometry 2 have showcased impressive advancements in AI’s mathematical problem-solving abilities. However, these systems still rely on human experts to translate mathematical problems into formal language for processing. Additionally, it is unclear how these specialized mathematical skills might be incorporated into other AI systems, such as for exploring hypotheses, testing innovative solutions to longstanding problems, and efficiently managing time-consuming aspects of proofs.
To overcome these limitations, Google researchers are developing a natural language reasoning system based on Gemini and their latest research. This new system aims to advance problem-solving capabilities without requiring formal language translation and is designed to integrate smoothly with other AI systems.
The Bottom Line
The performance of AlphaProof and AlphaGeometry 2 at the International Mathematical Olympiad is a notable leap forward in AI’s capability to tackle complex mathematical reasoning. Both systems demonstrated silver-medal-level performance by solving four out of six challenging problems, demonstrating significant advancements in formal proof and geometric problem-solving. Despite their achievements, these AI systems still depend on human input for translating problems into formal language and face challenges of integration with other AI systems. Future research aims to enhance these systems further, potentially integrating natural language reasoning to extend their capabilities across a broader range of mathematical challenges.