- Published on April 17, 2025
- In AI News
The models constitute IBM’s most multimodal release yet.

IBM has announced the release of the Granite 3.3 AI model lineup. Granite Speech 3.3 8B, a speech-to-text (STT) model that excels in automatic speech recognition (ASR) and automatic speech translation (AST), is in the spotlight.
The STT model is built on top of Granite 3.3 8B Instruct, a large language model, with a 2B sibling version also available. It features improved reasoning abilities. Moreover, its base models, Granite 3.3 8B Base and Granite 3.3 2B Base, are also available for developers to fine-tune.
All the models are released open source under an Apache 2.0 license.
Granite Speech 3.3 includes a speech encoder, speech project, an LLM, and low-rank adaptation (LoRA) adapters.
The company explained that the speech model is a compact and cost-efficient audio-in (and text-in), text-out STT model tailored for enterprise use cases. It mentioned that Granite Speech 3.3 provides greater accuracy than leading open and closed model competitors when tested with notable public datasets.
Granite Speech 3.3 8B also achieved a lower error rate for transcription tasks, as indicated by the benchmark tests.

The model also provides automated translation from English to a diverse set of languages, including French, Spanish, Italian, German, Portuguese, Japanese, and Mandarin, achieving performance on par with proprietary models like OpenAI’s GPT-4o and Google’s Gemini 2.0 Flash on supported languages.

To help improve Granite-driven applications, IBM has released retrieval-augmented generation-focused LoRA adapters for the previously released Granite 3.2 8B Instruct. These can be accessed on Hugging Face as part of Granite Experiments.
As part of the announcement, IBM mentioned several areas for improvement. Currently, the audio encoder for the speech model supports only English, so they are looking to support multilingual encoding.
The company further mentioned other refinements, such as data recipes with higher-quality training data and a unified structure to integrate audio features in training stages. The company also plans to support speech emotion recognition (SER) capabilities.
The company mentioned that it is training Granite 4.0, a new generation of models that aims to have significant gains in speed, context length, and capacity.
Ankush Das
I am a tech aficionado and a computer science graduate with a keen interest in AI, Open Source, and Cybersecurity.
Related Posts
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
Happy Llama 2025
AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru, India
Data Engineering Summit 2025
May 15 - 16, 2025 | 📍 Hotel Radisson Blu, Bengaluru
MachineCon GCC Summit 2025
June 20 to 22, 2025 | 📍 ITC Grand, Goa
Cypher India 2025
Sep 17 to 19, 2025 | 📍KTPO, Whitefield, Bengaluru, India
MLDS 2026
India's Biggest Developers Summit | 📍Nimhans Convention Center, Bengaluru
Rising 2026
India's Biggest Summit on Women in Tech & AI 📍 Bengaluru