OpenAI’s o3-Mini Shows Superior Accuracy Than o1-Mini Without ‘Thinking’ Longer: Harvard Study

1 month ago 20
  • Published on February 24, 2025
  • In AI News

Thinking harder isn’t the same as thinking longer. 

Illustration by Nalini Nirad

Harvard University and Vrije Universiteit Brussel recently released a research study titled ‘The Relationship Between Reasoning and Performance in Large Language Models’. The study explores whether the longer chain of thoughts leads to more accurate responses. 

The authors compared the results of OpenAI’s o1-mini and the o3-mini medium, one of the company’s newer and more powerful models, on Olympiad-level math problems. The study concluded that the o3 Mini outperformed the o1-mini, with fewer reasoning chains. 

Moreover, the authors also said that response accuracy declined as the reasoning chains grew. 

“This accuracy drop is significantly smaller in more proficient models, suggesting that new generations of reasoning models use test-time compute more effectively,” read the report, indicating that newer models use compute efficiently while performing a task. 

The study attributes the finding to the fact that “thinking harder” isn’t the same as “thinking longer”. “A possible hypothesis for this accuracy drop is that models tend to reason more on problems they cannot solve,” read a section of the report. Furthermore, the study revealed that it is possible that longer reasoning chains inherently have a higher probability of leading to a wrong solution. 

A detailed technical document of the research can be found here

Over the last few months, the AI industry has been betting big on reasoning models. Most recently, Elon Musk’s xAI announced reasoning capabilities within the latest Grok 3 model. Meanwhile, Anthropic, the company behind the Claude family of models, plans to release a hybrid model with reasoning capabilities soon. 

OpenAI was the first to ship a reasoning model, the o1 series. Recently, the company announced its latest o3 family of models, touted to be the most powerful reasoning models ever made. 

While the o3-mini model has been made available, OpenAI plans to unify the o-series and GPT-series models in the future with the release of GPT-5. The company is not planning to release o3 as a standalone model.

Recently, when Chinese AI startup DeepSeek launched the DeepSeek-R1 model, it shocked the industry by offering performance as good as OpenAI’s o1 while available for open-source use and trained at a fraction of the cost. 

Picture of Supreeth Koundinya

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

Association of Data Scientists

GenAI Corporate Training Programs

India's Biggest Women in Tech Summit

Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Rising 2025 Women in Tech & AI

March 20 and 21, 2025 | 📍 NIMHANS Convention Center, Bengaluru

AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blue, Bangalore, India

Data Engineering Summit 2025

May 15-16, 2025 | 📍 Hotel Radisson Blu, Bengaluru

MachineCon GCC Summit 2025

June 20-22, 2025 | 📍 ITC Grand, Goa

Sep 17-19, 2025 | 📍KTPO, Whitefield, Bangalore, India

India's Biggest Developers Summit Feb, 2025 | 📍Nimhans Convention Center, Bangalore

discord icon

Our Discord Community for AI Ecosystem.

Read Entire Article