Hume’s Octave Claims to Outperform ElevenLabs in Capturing Human-Like Emotions in AI Voices

1 month ago 18
  • Published on February 27, 2025
  • In AI News

The speech-language model can predict the tune, rhythm, and timbre of speech.

Octave, short for Omni-Capable Text and Voice Engine, is an LLM developed by Hume AI tailored for text-to-speech tasks.

This innovation comes at a time when ElevenLabs launched its new speech-to-text technology, Scribe.

The company explained that the model not only reads words but also understands their context, which enables it to enhance AI voice capabilities. It generates voices from prompts, acts out characters, and takes instructions to tweak emotions and style.

The speech-language model can predict the tune, rhythm, and timbre of speech. It can also detect the plot twists, emotional cues, and character traits from the script or prompt.

The prompts can be nuanced, like requesting a “patient, empathetic counsellor with an AMSR voice”, allowing for highly specific tonalities. Furthermore, the platform’s ‘Action Instructions’ feature lets users tweak the emotion or style of an existing voice, such as asking it to “sound sarcastic”.

Hume recently organised a blind comparison study with 180 human raters. In the study, Octave’s outputs were favoured over those generated by ElevenLabs’ Voice Design in several key aspects. Notably, Octave outperformed in audio quality (71.6%), naturalness (51.7%), and in how well the speech matched the intended prompt (57.7%) across a diverse set of 120 prompts.

While the voice cloning feature is not currently available, the company said it will soon be. The feature will allow users to clone a voice extracted from as little as five seconds of audio.

Octave is available on Hume’s official portal and through its API. Users can also access a voice library of over 40 premade voices and try out its project interface, which is in preview, to generate long-form content like audiobooks and podcasts.

The model is focused on English-language speech presently, but can also speak Spanish. They plan to improve its capabilities for other languages soon.

In addition to Octave, Hume AI has also introduced the Expressive TTS Arena, a public evaluation platform inspired by Hugging Face’s TTS Arena.

Picture of Ankush Das

Ankush Das

I am a tech aficionado and a computer science graduate with a keen interest in AI, Open Source, and Cybersecurity.

Association of Data Scientists

GenAI Corporate Training Programs

India's Biggest Women in Tech Summit

March 20 and 21, 2025 | 📍 NIMHANS Convention Center, Bengaluru

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Rising 2025 Women in Tech & AI

March 20 - 21, 2025 | 📍 NIMHANS Convention Center, Bengaluru

AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru, India

Data Engineering Summit 2025

May 15 - 16, 2025 | 📍 Hotel Radisson Blu, Bengaluru

MachineCon GCC Summit 2025

June 20 to 22, 2025 | 📍 ITC Grand, Goa

Sep 17 to 19, 2025 | 📍KTPO, Whitefield, Bengaluru, India

India's Biggest Developers Summit Feb, 2025 | 📍Nimhans Convention Center, Bengaluru

discord icon

Our Discord Community for AI Ecosystem.

Read Entire Article