Meta Releases First Two Multimodal Llama 4 Models, Plans Two Trillion Parameter Model

2 weeks ago 12

Published on April 6, 2025
In AI News

The models are now available for download on llama.com and Hugging Face and can be accessed via Meta AI products on WhatsApp, Messenger, Instagram Direct, and the Meta AI website.

Why Ollama is Good for Running LLMs on Computer

Meta has announced the release of two new open-weight multimodal models—Llama 4 Scout and Llama 4 Maverick. Both models are now available for download on llama.com and Hugging Face and can be accessed via Meta AI products on WhatsApp, Messenger, Instagram Direct, and the Meta AI website.

Llama 4 Scout and Maverick are built on a mixture-of-experts (MoE) architecture, making them Meta’s most advanced models released to date. Llama 4 Scout features 17 billion active parameters and 16 experts, designed to fit within a single H100 GPU. According to Meta, it supports an industry-leading 10 million token context window, enabling complex tasks such as multi-document summarisation and reasoning over large codebases.

Meta said, “Scout is our most efficient model ever in its class. It delivers performance that surpasses Llama 3 while being more scalable.” The model achieves better results than competing systems, including Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 on widely reported benchmarks.

Llama 4 Maverick, also a 17 billion active parameter model but with 128 experts, is designed for higher-end use cases. It includes 400 billion total parameters and performs competitively with larger models like DeepSeek V3 on reasoning and coding tasks. Meta said that Maverick exceeds GPT-4o and Gemini 2.0 Flash on several benchmarks. It scored an ELO of 1417 on LMArena in experimental chat settings.

Meta chief Mark Zuckerberg described it as the “workhorse,” built for larger-scale tasks. He said it “beats GPT-4o and Gemini Flash 2 on all benchmarks” while remaining “smaller and more efficient than DeepSeek-V3.”

“These models represent a step forward in balancing performance and cost,” Meta said. “Maverick can run on a single H100 host or scale to distributed inference, offering developers flexibility.”

The models were distilled from Llama 4 Behemoth, a yet-unreleased teacher model that is also a multimodal mixture-of-experts model, with 288B active parameters, 16 experts, and nearly two trillion total parameters. Behemoth is still in training but has already demonstrated top-tier results on STEM benchmarks such as MATH-500 and GPQA Diamond, outperforming GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro.

Meta noted that Behemoth will not be released yet, but it played a central role in shaping the smaller models through a process called codistillation. The training involved innovations such as a novel distillation loss function and dynamic data selection strategies.

Zuckerberg said the company will next release the Llama 4 reasoning model. He added that details will be shared next month.

The company also shared new architectural insights. Both Scout and Maverick use interleaved attention layers without positional embeddings and a technique called inference-time temperature scaling to generalise across longer input sequences. The models were pre-trained on diverse multimodal data, including image and video frame stills, and support multimodal interactions across multiple images and text.

In terms of training methodology, Meta introduced a lightweight supervised fine-tuning (SFT) approach followed by online reinforcement learning (RL) and direct preference optimisation (DPO). For Maverick, over 50% of SFT data was filtered out to focus on harder examples, improving the model’s performance in reasoning and conversation.

Meta highlighted the strategic importance of openness in its release. “We believe openness drives innovation and benefits everyone,” the company said. Llama 4 Scout and Maverick are being released under open terms, with broader access expected soon through cloud providers and partners.

The announcement comes ahead of LlamaCon, scheduled for April 29, where Meta plans to share more about its vision for the future of the Llama platform.

“This is just the beginning,” Meta stated. “We’re building models that can reason, understand images, and converse naturally to support the next generation of applications.”

📣 Want to advertise in AIM? Book here

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.

Our Upcoming Conference

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

DE&I in India’s Tech 2025

Abhijeet Adhikari

DE&I is redefining the future of India’s tech industry fueling innovation, productivity, and a more inclusive culture. As 2025 approaches, the focus shifts from intent to impact. This report explores

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Happy Llama 2025

AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru, India

Data Engineering Summit 2025

May 15 - 16, 2025 | 📍 Hotel Radisson Blu, Bengaluru

MachineCon GCC Summit 2025

June 20 to 22, 2025 | 📍 ITC Grand, Goa

Cypher India 2025

Sep 17 to 19, 2025 | 📍KTPO, Whitefield, Bengaluru, India

MLDS 2026

India's Biggest Developers Summit | 📍Nimhans Convention Center, Bengaluru

Rising 2026

India's Biggest Summit on Women in Tech & AI 📍 Bengaluru

Read Entire Article

Meta Releases First Two Multimodal Llama 4 Models, Plans Two Trillion Parameter Model

Siddharth Jindal

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Related

The State of Reinforcement Learning for LLM Reasoning

GPT-4o makes beautiful images but fails basic reasoning test...

Researchers introduce COLORBENCH to test color understanding...