- Published on December 13, 2024
- In AI News
It offers performance comparable to multiple leading large language models.
Microsoft has launched their latest small model, the phi-4, with 14 billion parameters. The model is said to ‘excel’ at complex reasoning capabilities. It is currently available on Azure AI Foundry and will be available on Hugging Face from next week onwards. Microsoft has also released a detailed technical report for phi-4.
The phi-4 offers strong competition to leading small language models and also gives large frontier models a run for their money. Microsoft attributes its performance to the use of high-quality synthetic datasets and post-training innovations. In math competition problems, phi-4 outperformed Gemini 1.5 Pro and OpenAI’s GPT-4o.
Surprise #NeurIPS2024 drop for y'all: phi-4 available open weights and with amazing results!!!
Tl;dr: phi-4 is in Llama 3.3-70B category (win some lose some) with 5x fewer parameters, and notably outperforms on pure reasoning like GPQA (56%) and MATH (80%). pic.twitter.com/nGaOTmuKY3
“Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size—especially on reasoning-focused benchmarks—due to improved data, training curriculum, and innovations in the post-training scheme,” read the technical report from Microsoft.
Notably, the phi-4 model also offers performance levels inside the region of Meta’s newly released Llama 3.3 models. In fact, the phi-4, as per benchmarks, offers better performance compared to Llama 3.3 in reasoning and math capabilities.
phi-4 is Microsoft’s successor to the phi-3.5 models that were released earlier this year.
Microsoft’s announcement comes just days after Google launched their small model, the Gemini 2.0 Flash. While Microsoft hasn’t officially compared phi-4 with Gemini 2.0 Flash, it achieved a 62.1% score in the GPQA reasoning benchmark, compared to phi-4’s 56.1% score.
Google is also going toe-to-toe with Microsoft with their latest Project Mariner, which not only rivals the Copilot Vision but goes a step further. Unlike Copilot Vision, Project Mariner is also capable of autonomously navigating a web browser tab.
phi-4 will also compete with Anthropic Claude’s Haiku 3.5, which was made available via the web and mobile app for all users yesterday. As per benchmarks, the phi-4 model outperforms Claude 3.5 Haiku on several benchmarks.
Small models may finally deliver the set promise. It is about time we see them on more and more devices that let users access AI models locally.
Supreeth Koundinya
Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
February 5 – 7, 2025 | Nimhans Convention Center, Bangalore
Rising 2025 | DE&I in Tech & AI
Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru
Data Engineering Summit 2025
May, 2025 | 📍 Bangalore, India
MachineCon GCC Summit 2025
June 2025 | 583 Park Avenue, New York
September, 2025 | 📍Bangalore, India
MachineCon GCC Summit 2025
The Most Powerful GCC Summit of the year
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.