Deepgram’s New Text-to-Speech AI Model Outperforms ElevenLabs and Open AI

4 days ago 9
  • Published on April 15, 2025
  • In AI News

Deepgram’s Aura-2 could be a wild card entry for enterprise use cases.

Deepgram, a voice AI platform, on Tuesday launched Aura-2, its next-generation text-to-speech (TTS) model. The company calls it the world’s most professional and cost-effective enterprise-grade TTS solution. 

happyllama

In blind tests by users specifically for conversational enterprise applications, the model outperformed leading competitors like ElevenLabs, Cartesia, and OpenAI.

Aura-2 is built on top of Deepgram Enterprise Runtime (DER), a custom infrastructure layer for its speech models. It aims to provide domain-specific pronunciation, professional voice quality, and context-aware delivery with the speech generated. 

With this, developers can enhance real-time enterprise interactions across various use cases, including customer service, virtual agents, and AI-powered assistants. 

Aura-2 can be deployed via cloud or on-premises APIs. Moreover, new users will receive $200 in free credits to try the model’s capabilities on the official website.

The company explains a significant gap in enterprise-optimised voice AI, which requires a natural-sounding voice and domain-specific pronunciation. Deepgram’s Aura-2 attempts to bridge this gap for business-critical environments.

“In head-to-head comparisons across enterprise scenarios, Deepgram came out on top nearly 60% of the time,” the company stated. As per the chart shared, Aura-2 was preferred by users 61.8% compared to 38.2% for ElevenLabs. Similarly, a preference of 52% can be seen in comparison to 48% for OpenAI.

When asked about the model’s different use cases, Natalie Rutgers, VP of product for Deepgram, told AIM: “While people can use Aura-2 for podcasts and other entertainment use cases, that isn’t our focus with this offering. Our customers care about having real-time voices that represent the people you’d hear at your appointments, your pharmacy, and your customer service lines.”

Rutgers also mentioned that the model supports English voices, including British and Australian accents, with multilingual support underway.

Deepgram’s Aura-2 is also optimised for real-time performance. It claims to deliver fast response times, with a sub-150ms time-to-first-byte.

The model claims to offer the lowest pricing compared to ElevenLabs Flash and Cartesia Sonic. Deepgram explains, “At $0.030 per 1,000 characters, it offers substantial savings compared to alternatives like Elevenlabs Turbo ($0.050) and Cartesia Sonic ($0.038).” 

The company states that usage-based pricing eliminates quality/cost tradeoffs, enabling uniform voice experiences at every touchpoint while maintaining performance and managing costs. 

Picture of Ankush Das

Ankush Das

I am a tech aficionado and a computer science graduate with a keen interest in AI, Open Source, and Cybersecurity.

Related Posts

Our Upcoming Conference

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Happy Llama 2025

AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru, India

Data Engineering Summit 2025

May 15 - 16, 2025 | 📍 Hotel Radisson Blu, Bengaluru

MachineCon GCC Summit 2025

June 20 to 22, 2025 | 📍 ITC Grand, Goa

Cypher India 2025

Sep 17 to 19, 2025 | 📍KTPO, Whitefield, Bengaluru, India

MLDS 2026

India's Biggest Developers Summit | 📍Nimhans Convention Center, Bengaluru

Rising 2026

India's Biggest Summit on Women in Tech & AI 📍 Bengaluru

Read Entire Article