Hume’s Octave Claims to Outperform ElevenLabs in Capturing Human-Like Emotions in AI Voices

1 month ago 18

Published on February 27, 2025
In AI News

The speech-language model can predict the tune, rhythm, and timbre of speech.

Octave, short for Omni-Capable Text and Voice Engine, is an LLM developed by Hume AI tailored for text-to-speech tasks.

This innovation comes at a time when ElevenLabs launched its new speech-to-text technology, Scribe.

The company explained that the model not only reads words but also understands their context, which enables it to enhance AI voice capabilities. It generates voices from prompts, acts out characters, and takes instructions to tweak emotions and style.

The speech-language model can predict the tune, rhythm, and timbre of speech. It can also detect the plot twists, emotional cues, and character traits from the script or prompt.

The prompts can be nuanced, like requesting a “patient, empathetic counsellor with an AMSR voice”, allowing for highly specific tonalities. Furthermore, the platform’s ‘Action Instructions’ feature lets users tweak the emotion or style of an existing voice, such as asking it to “sound sarcastic”.

Hume recently organised a blind comparison study with 180 human raters. In the study, Octave’s outputs were favoured over those generated by ElevenLabs’ Voice Design in several key aspects. Notably, Octave outperformed in audio quality (71.6%), naturalness (51.7%), and in how well the speech matched the intended prompt (57.7%) across a diverse set of 120 prompts.

While the voice cloning feature is not currently available, the company said it will soon be. The feature will allow users to clone a voice extracted from as little as five seconds of audio.

Octave is available on Hume’s official portal and through its API. Users can also access a voice library of over 40 premade voices and try out its project interface, which is in preview, to generate long-form content like audiobooks and podcasts.

The model is focused on English-language speech presently, but can also speak Spanish. They plan to improve its capabilities for other languages soon.

In addition to Octave, Hume AI has also introduced the Expressive TTS Arena, a public evaluation platform inspired by Hugging Face’s TTS Arena.