OpenAI’s o3-Mini Shows Superior Accuracy Than o1-Mini Without ‘Thinking’ Longer: Harvard Study

1 month ago 20

Published on February 24, 2025
In AI News

Thinking harder isn’t the same as thinking longer.

Illustration by Nalini Nirad

Harvard University and Vrije Universiteit Brussel recently released a research study titled ‘The Relationship Between Reasoning and Performance in Large Language Models’. The study explores whether the longer chain of thoughts leads to more accurate responses.

The authors compared the results of OpenAI’s o1-mini and the o3-mini medium, one of the company’s newer and more powerful models, on Olympiad-level math problems. The study concluded that the o3 Mini outperformed the o1-mini, with fewer reasoning chains.

Moreover, the authors also said that response accuracy declined as the reasoning chains grew.

“This accuracy drop is significantly smaller in more proficient models, suggesting that new generations of reasoning models use test-time compute more effectively,” read the report, indicating that newer models use compute efficiently while performing a task.

The study attributes the finding to the fact that “thinking harder” isn’t the same as “thinking longer”. “A possible hypothesis for this accuracy drop is that models tend to reason more on problems they cannot solve,” read a section of the report. Furthermore, the study revealed that it is possible that longer reasoning chains inherently have a higher probability of leading to a wrong solution.

A detailed technical document of the research can be found here.

Over the last few months, the AI industry has been betting big on reasoning models. Most recently, Elon Musk’s xAI announced reasoning capabilities within the latest Grok 3 model. Meanwhile, Anthropic, the company behind the Claude family of models, plans to release a hybrid model with reasoning capabilities soon.

OpenAI was the first to ship a reasoning model, the o1 series. Recently, the company announced its latest o3 family of models, touted to be the most powerful reasoning models ever made.

While the o3-mini model has been made available, OpenAI plans to unify the o-series and GPT-series models in the future with the release of GPT-5. The company is not planning to release o3 as a standalone model.

Recently, when Chinese AI startup DeepSeek launched the DeepSeek-R1 model, it shocked the industry by offering performance as good as OpenAI’s o1 while available for open-source use and trained at a fraction of the cost.

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.