Microsoft Launches rStar-Math, Achieves Top-Level Math Reasoning

3 months ago 35

Published on January 9, 2025
In AI News

Smaller models are easier to use, require less powerful hardware, and make advanced AI tools available to more people and organisations

Microsoft researchers have developed ‘rStar-Math’, a method that enables small language models (SLMs) to solve challenging math problems with remarkable accuracy, matching or even surpassing larger models like OpenAI’s o1. Instead of relying on knowledge distillation from bigger models, rStar-Math allows smaller models to improve independently through self-evolution.

“Our work demonstrates that small language models can achieve frontier-level performance in math reasoning through self-evolution and careful step-by-step verification,” the researchers said in the paper.

Why does this matter? Smaller models are easier to use, require less powerful hardware, and make advanced AI tools available to more people and organisations. They are especially useful in areas like education, math, coding, and research, where accurate, step-by-step reasoning is crucial.

The open-source release of rStar-Math and Microsoft’s Phi-4 model on Hugging Face allows others to customise and use these tools for a wide range of applications, making AI more affordable and accessible.

The system uses Monte Carlo Tree Search (MCTS), a strategy often used in games like chess, to tackle problems in smaller, manageable steps. Each step is validated with code execution to ensure accuracy, avoiding the common issue of producing correct answers with flawed reasoning.

Features of rStar-Math: rStar-Math incorporates three innovations to improve performance. It uses MCTS rollouts to generate step-by-step training data, ensuring accuracy. A process preference model (PPM) evaluates and guides intermediate steps without relying on imprecise scoring. The system then evolves iteratively over four rounds to refine models and data for solving increasingly complex problems.

On the MATH benchmark, accuracy increased from 58.8% to 90%, outperforming OpenAI’s o1-preview. The system also solved 53.3% of problems in the USA Math Olympiad (AIME), ranking in the top 20% of high school competitors. It performed strongly on other benchmarks, including GSM8K, Olympiad Bench, and college-level challenges.

The study highlights the potential of smaller AI models to achieve advanced reasoning capabilities typically associated with larger systems. It also shows how such models can develop intrinsic self-reflection, enabling them to identify and correct errors during problem-solving.

The framework, along with its code and data, is open-source and available on GitHub. This makes it accessible to researchers and developers, paving the way for smaller, more efficient AI systems capable of handling complex reasoning tasks.