Alibaba Releases Qwen2.5 Omni, Adds Voice and Video Modes to Qwen Chat

3 weeks ago 12
  • Published on March 27, 2025
  • In AI News

Alibaba's Qwen2.5-Omni model has shown strong performance across various tasks, including speech recognition, audio, video, and more.

Qwen-2.5-is-Winning-the-AI-Agents-Race

Alibaba, on Wednesday, added voice and video chat capabilities to Qwen Chat, besides releasing its brand new open-source model, Qwen2.5-Omni-7B, which made this possible. It was released as an open-source model under Apache 2.0 licence.

The company highlighted in a blog post that Qwen2.5-Omni is the new flagship end-to-end multimodal model in the Qwen series. It stated that it is designed for multimodal perception and seamlessly processes text, images, audio, and video, delivering real-time streaming responses via text and speech synthesis.

The key features of the model include a ‘Thinker-Talker’ architecture, which allows it to provide real-time responses. The Thinker part of the architecture is a Transformer decoder, which acts like the brain and the Talker, designed as a dual-track autoregressive Transformer decoder, operates like the human mouth.

Alibaba’s Qwen2.5-Omni model has shown strong performance across various tasks, including speech recognition, translation, audio and video understanding, and speech generation, outperforming similar models at tasks that require multiple modalities. 

It was compared to similar single-modality and closed-source models like Qwen2.5-VL-7B, Qwen2-Audio, and Gemini-1.5-pro, achieving state-of-the-art performance.

— Qwen (@Alibaba_Qwen) March 26, 2025

The paper and code for the new model can be found on GitHub, while the AI model is available on Hugging Face along with a demo.

Last month, Alibaba also launched QwQ-Max-Preview, a new AI reasoning model within the Qwen family that specialises in mathematics and coding tasks and features a “thinking” capability in the Qwen Chat application. 

The model, which outperformed OpenAI’s models on the LiveCodeBench leaderboard, is expected to have smaller variants open-sourced for local device deployment, as well as a dedicated mobile app.

There may be a lot more coming, considering Alibaba’s commitment to investing over $52 billion in AI over the next three years.

Picture of Ankush Das

Ankush Das

I am a tech aficionado and a computer science graduate with a keen interest in AI, Open Source, and Cybersecurity.

Association of Data Scientists

GenAI Corporate Training Programs

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru, India

Data Engineering Summit 2025

May 15 - 16, 2025 | 📍 Hotel Radisson Blu, Bengaluru

MachineCon GCC Summit 2025

June 20 to 22, 2025 | 📍 ITC Grand, Goa

Sep 17 to 19, 2025 | 📍KTPO, Whitefield, Bengaluru, India

India's Biggest Developers Summit | 📍Nimhans Convention Center, Bengaluru

India's Biggest Summit on Women in Tech & AI 📍 Bengaluru

Read Entire Article