- Published on February 27, 2025
- In AI News
The Phi-4 multimodal model supports applications including document analysis and speech recognition.
Microsoft has launched Phi-4-multimodal and Phi-4-mini, the latest additions to its Phi family of small language models (SLMs). These models are now available on Azure AI Foundry, Hugging Face, and the NVIDIA API Catalog.
Phi-4-multimodal is a 5.6 billion-parameter model that integrates speech, vision, and text processing. “By leveraging advanced cross-modal learning techniques, this model enables more natural and context-aware interactions, allowing devices to understand and reason across multiple input modalities simultaneously,” said Weizhu Chen, vice president of generative AI at Microsoft.
Last year, Microsoft launched phi-4, with 14 billion parameters. The model excels at complex reasoning capabilities.
The Phi-4 multimodal model supports applications including document analysis and speech recognition. On multimodal audio and visual benchmarks, it surpasses Google Gemini 2 Flash and Gemini 1.5 Pro. Microsoft claims that it is comparable to OpenAI’s GPT-4o.
The company said it has demonstrated strong performance in speech-related tasks, surpassing models such as WhisperV3 and SeamlessM4T-v2-Large in automatic speech recognition and speech translation. It also ranks first on the Hugging Face OpenASR leaderboard with a word error rate of 6.14%. The model shows competitive results in document and chart understanding, Optical Character Recognition (OCR), and visual science reasoning.
On the other hand, Phi-4-mini is a 3.8 billion-parameter text-based model for reasoning, coding, and long-context tasks. It supports sequences of up to 128,000 tokens and offers efficient processing with reduced computational requirements. It supports function calling, allowing integration with external tools and APIs.
Both of the models are suitable for deployment in constrained computing environments. They can be optimised using ONNX Runtime for cross-platform availability and lower latency.
Microsoft is incorporating these models into its ecosystem, including Windows applications and Copilot+ PCs. “Copilot+ PCs will build upon Phi-4-multimodal’s capabilities, delivering the power of Microsoft’s advanced SLMs without the energy drain,” said Vivek Pradeep, vice president and distinguished engineer of Windows Applied Sciences.
Developers can access Phi-4-multimodal and Phi-4-mini on multiple platforms and explore their applications in various industries, including finance, healthcare, and automotive technology.
Siddharth Jindal
Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
Rising 2025 Women in Tech & AI
March 20 and 21, 2025 | 📍 NIMHANS Convention Center, Bengaluru
AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blue, Bangalore, India
Data Engineering Summit 2025
May 15-16, 2025 | 📍 Hotel Radisson Blu, Bengaluru
MachineCon GCC Summit 2025
June 20-22, 2025 | 📍 ITC Grand, Goa
Sep 17-19, 2025 | 📍KTPO, Whitefield, Bangalore, India
India's Biggest Developers Summit Feb, 2025 | 📍Nimhans Convention Center, Bangalore
Our Discord Community for AI Ecosystem.