- Published on April 8, 2025
- In AI News
Llama 4 models, including Scout and Maverick, are now live on its platform, allowing developers to build and deploy AI applications at competitive pricing.

Ola chief Bhavish Aggarwal on Tuesday announced that Krutrim Cloud will be able to run Meta’s Llama 4 models entirely on an India-hosted cloud infrastructure. The move will allow developers across the country to access advanced AI capabilities while maintaining full data sovereignty.
“Excited to share that Krutrim is among the world’s first to host Meta’s Llama 4 models running entirely on its India-hosted cloud. Powering our developers with world-class AI, at industry-disrupting prices with complete data sovereignty,” he said in a post on X.
In a separate LinkedIn post, he said that the company is deploying both Llama 4 Scout and Llama 4 Maverick models at even more disruptive prices – just ₹7 to ₹17 per million tokens.“This isn’t just about cost savings – it’s about democratising access to cutting-edge AI for every Indian developer and startup,” he said
Llama 4 models, including Scout and Maverick, are now live on its platform, allowing developers to build and deploy AI applications at competitive pricing. The models are hosted within India’s borders, aligning with growing demands for localised data control and privacy.
Krutrim Cloud, launched last year, provides a comprehensive suite of AI services, including Model-as-a-Service (MaaS) and GPU-as-a-Service. It recently added support for DeepSeek models as well.
Meta recently launched two multimodal open-weight models—Llama 4 Scout and Llama 4 Maverick. Both models are built on a mixture-of-experts (MoE) setup.
Llama 4 Scout features 17 billion active parameters and 16 experts, designed to fit within a single H100 GPU. Meta claims it supports an industry-leading 10 million token context window, enabling complex tasks such as multi-document summarisation and reasoning over large codebases.
Llama 4 Maverick is a 17 billion active parameter model with 128 experts. It includes 400 billion total parameters and performs competitively with larger models like DeepSeek V3 on reasoning and coding tasks. Meta said that Maverick exceeds GPT-4o and Gemini 2.0 Flash on several benchmarks. It scored an ELO of 1417 on LMArena in experimental chat settings.
The models were distilled from Llama 4 Behemoth. This unreleased teacher model is also a multimodal mixture-of-experts model, with 288B active parameters, 16 experts, and nearly two trillion total parameters.
There were also some questions around the training and testing data of the model, which were later clarified by Ahmad Al-Dahle, the lead of GenAI at Meta. “That’s simply not true, and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilise implementations.”
Siddharth Jindal
Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
Happy Llama 2025
AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru, India
Data Engineering Summit 2025
May 15 - 16, 2025 | 📍 Hotel Radisson Blu, Bengaluru
MachineCon GCC Summit 2025
June 20 to 22, 2025 | 📍 ITC Grand, Goa
Cypher India 2025
Sep 17 to 19, 2025 | 📍KTPO, Whitefield, Bengaluru, India
MLDS 2026
India's Biggest Developers Summit | 📍Nimhans Convention Center, Bengaluru
Rising 2026
India's Biggest Summit on Women in Tech & AI 📍 Bengaluru