Deepgram’s New Text-to-Speech AI Model Outperforms ElevenLabs and Open AI

4 days ago 9

Published on April 15, 2025
In AI News

Deepgram’s Aura-2 could be a wild card entry for enterprise use cases.

Deepgram, a voice AI platform, on Tuesday launched Aura-2, its next-generation text-to-speech (TTS) model. The company calls it the world’s most professional and cost-effective enterprise-grade TTS solution.

In blind tests by users specifically for conversational enterprise applications, the model outperformed leading competitors like ElevenLabs, Cartesia, and OpenAI.

Aura-2 is built on top of Deepgram Enterprise Runtime (DER), a custom infrastructure layer for its speech models. It aims to provide domain-specific pronunciation, professional voice quality, and context-aware delivery with the speech generated.

With this, developers can enhance real-time enterprise interactions across various use cases, including customer service, virtual agents, and AI-powered assistants.

Aura-2 can be deployed via cloud or on-premises APIs. Moreover, new users will receive $200 in free credits to try the model’s capabilities on the official website.

The company explains a significant gap in enterprise-optimised voice AI, which requires a natural-sounding voice and domain-specific pronunciation. Deepgram’s Aura-2 attempts to bridge this gap for business-critical environments.

“In head-to-head comparisons across enterprise scenarios, Deepgram came out on top nearly 60% of the time,” the company stated. As per the chart shared, Aura-2 was preferred by users 61.8% compared to 38.2% for ElevenLabs. Similarly, a preference of 52% can be seen in comparison to 48% for OpenAI.

When asked about the model’s different use cases, Natalie Rutgers, VP of product for Deepgram, told AIM: “While people can use Aura-2 for podcasts and other entertainment use cases, that isn’t our focus with this offering. Our customers care about having real-time voices that represent the people you’d hear at your appointments, your pharmacy, and your customer service lines.”

Rutgers also mentioned that the model supports English voices, including British and Australian accents, with multilingual support underway.

Deepgram’s Aura-2 is also optimised for real-time performance. It claims to deliver fast response times, with a sub-150ms time-to-first-byte.

The model claims to offer the lowest pricing compared to ElevenLabs Flash and Cartesia Sonic. Deepgram explains, “At $0.030 per 1,000 characters, it offers substantial savings compared to alternatives like Elevenlabs Turbo ($0.050) and Cartesia Sonic ($0.038).”

The company states that usage-based pricing eliminates quality/cost tradeoffs, enabling uniform voice experiences at every touchpoint while maintaining performance and managing costs.