IBM Introduces Granite 3.3 Series of AI Models

2 days ago 6

Published on April 17, 2025
In AI News

The models constitute IBM’s most multimodal release yet.

IBM has announced the release of the Granite 3.3 AI model lineup. Granite Speech 3.3 8B, a speech-to-text (STT) model that excels in automatic speech recognition (ASR) and automatic speech translation (AST), is in the spotlight.

The STT model is built on top of Granite 3.3 8B Instruct, a large language model, with a 2B sibling version also available. It features improved reasoning abilities. Moreover, its base models, Granite 3.3 8B Base and Granite 3.3 2B Base, are also available for developers to fine-tune.

All the models are released open source under an Apache 2.0 license.

Granite Speech 3.3 includes a speech encoder, speech project, an LLM, and low-rank adaptation (LoRA) adapters.

The company explained that the speech model is a compact and cost-efficient audio-in (and text-in), text-out STT model tailored for enterprise use cases. It mentioned that Granite Speech 3.3 provides greater accuracy than leading open and closed model competitors when tested with notable public datasets.

Granite Speech 3.3 8B also achieved a lower error rate for transcription tasks, as indicated by the benchmark tests.

The model also provides automated translation from English to a diverse set of languages, including French, Spanish, Italian, German, Portuguese, Japanese, and Mandarin, achieving performance on par with proprietary models like OpenAI’s GPT-4o and Google’s Gemini 2.0 Flash on supported languages.

To help improve Granite-driven applications, IBM has released retrieval-augmented generation-focused LoRA adapters for the previously released Granite 3.2 8B Instruct. These can be accessed on Hugging Face as part of Granite Experiments.

As part of the announcement, IBM mentioned several areas for improvement. Currently, the audio encoder for the speech model supports only English, so they are looking to support multilingual encoding.

The company further mentioned other refinements, such as data recipes with higher-quality training data and a unified structure to integrate audio features in training stages. The company also plans to support speech emotion recognition (SER) capabilities.

The company mentioned that it is training Granite 4.0, a new generation of models that aims to have significant gains in speed, context length, and capacity.