Microsoft’s phi-4 is a Monstrous Small Model

4 months ago 34

Published on December 13, 2024
In AI News

It offers performance comparable to multiple leading large language models.

Microsoft has launched their latest small model, the phi-4, with 14 billion parameters. The model is said to ‘excel’ at complex reasoning capabilities. It is currently available on Azure AI Foundry and will be available on Hugging Face from next week onwards. Microsoft has also released a detailed technical report for phi-4.

The phi-4 offers strong competition to leading small language models and also gives large frontier models a run for their money. Microsoft attributes its performance to the use of high-quality synthetic datasets and post-training innovations. In math competition problems, phi-4 outperformed Gemini 1.5 Pro and OpenAI’s GPT-4o.

Surprise #NeurIPS2024 drop for y'all: phi-4 available open weights and with amazing results!!!

Tl;dr: phi-4 is in Llama 3.3-70B category (win some lose some) with 5x fewer parameters, and notably outperforms on pure reasoning like GPQA (56%) and MATH (80%). pic.twitter.com/nGaOTmuKY3

— Sebastien Bubeck (@SebastienBubeck) December 13, 2024

“Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size—especially on reasoning-focused benchmarks—due to improved data, training curriculum, and innovations in the post-training scheme,” read the technical report from Microsoft.

Notably, the phi-4 model also offers performance levels inside the region of Meta’s newly released Llama 3.3 models. In fact, the phi-4, as per benchmarks, offers better performance compared to Llama 3.3 in reasoning and math capabilities.

phi-4 is Microsoft’s successor to the phi-3.5 models that were released earlier this year.

Microsoft’s announcement comes just days after Google launched their small model, the Gemini 2.0 Flash. While Microsoft hasn’t officially compared phi-4 with Gemini 2.0 Flash, it achieved a 62.1% score in the GPQA reasoning benchmark, compared to phi-4’s 56.1% score.

Google is also going toe-to-toe with Microsoft with their latest Project Mariner, which not only rivals the Copilot Vision but goes a step further. Unlike Copilot Vision, Project Mariner is also capable of autonomously navigating a web browser tab.

phi-4 will also compete with Anthropic Claude’s Haiku 3.5, which was made available via the web and mobile app for all users yesterday. As per benchmarks, the phi-4 model outperforms Claude 3.5 Haiku on several benchmarks.

Small models may finally deliver the set promise. It is about time we see them on more and more devices that let users access AI models locally.

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.