- Published on January 9, 2025
- In AI News
Smaller models are easier to use, require less powerful hardware, and make advanced AI tools available to more people and organisations
Microsoft researchers have developed ‘rStar-Math’, a method that enables small language models (SLMs) to solve challenging math problems with remarkable accuracy, matching or even surpassing larger models like OpenAI’s o1. Instead of relying on knowledge distillation from bigger models, rStar-Math allows smaller models to improve independently through self-evolution.
“Our work demonstrates that small language models can achieve frontier-level performance in math reasoning through self-evolution and careful step-by-step verification,” the researchers said in the paper.
Why does this matter? Smaller models are easier to use, require less powerful hardware, and make advanced AI tools available to more people and organisations. They are especially useful in areas like education, math, coding, and research, where accurate, step-by-step reasoning is crucial.
The open-source release of rStar-Math and Microsoft’s Phi-4 model on Hugging Face allows others to customise and use these tools for a wide range of applications, making AI more affordable and accessible.
The system uses Monte Carlo Tree Search (MCTS), a strategy often used in games like chess, to tackle problems in smaller, manageable steps. Each step is validated with code execution to ensure accuracy, avoiding the common issue of producing correct answers with flawed reasoning.
Features of rStar-Math: rStar-Math incorporates three innovations to improve performance. It uses MCTS rollouts to generate step-by-step training data, ensuring accuracy. A process preference model (PPM) evaluates and guides intermediate steps without relying on imprecise scoring. The system then evolves iteratively over four rounds to refine models and data for solving increasingly complex problems.
On the MATH benchmark, accuracy increased from 58.8% to 90%, outperforming OpenAI’s o1-preview. The system also solved 53.3% of problems in the USA Math Olympiad (AIME), ranking in the top 20% of high school competitors. It performed strongly on other benchmarks, including GSM8K, Olympiad Bench, and college-level challenges.
The study highlights the potential of smaller AI models to achieve advanced reasoning capabilities typically associated with larger systems. It also shows how such models can develop intrinsic self-reflection, enabling them to identify and correct errors during problem-solving.
The framework, along with its code and data, is open-source and available on GitHub. This makes it accessible to researchers and developers, paving the way for smaller, more efficient AI systems capable of handling complex reasoning tasks.
Aditi Suresh
I hold a degree in political science, and am interested in how AI and online culture intersect. I can be reached at [email protected]
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
February 5 – 7, 2025 | Nimhans Convention Center, Bangalore
Rising 2025 | DE&I in Tech & AI
Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru
Data Engineering Summit 2025
15-16 May, 2025 | 📍 Taj Yeshwantpur, Bengaluru, India
AI Startups Conference.
April 25 /
Hotel Radisson Blu /
Bangalore, India
17-19 September, 2025 | 📍KTPO, Whitefield, Bangalore, India
MachineCon GCC Summit 2025
19-20th June 2025 | Bangalore
Our Discord Community for AI Ecosystem.