Google Launches Gemini 2.0 Making the Age of AI Agents a Reality

4 months ago 37

Published on December 11, 2024
In AI News

2025 will be the year of AI agents and Gemini 2.0 will be the generation of models that underpin our agent-based work.’

As predicted by AIM, Google has finally launched Gemini 2.0, its next-generation AI model, built to redefine multimodal capabilities and introduce agentic functionalities.

“Today we’re excited to launch our next era of models built for this new agentic era: introducing Gemini 2.0, our most capable model yet. With new advances in multimodality — like native image and audio output — and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant,” Google said in the blog post.

“This is really just the beginning. 2025 will be the year of AI agents and Gemini 2.0 will be the generation of models that underpin our agent-based work,” said Google DeepMind chief Demis Hassabis.

Gemini 2.0 Flash supports multimodal inputs, including images, video, and audio, as well as multimodal outputs such as natively generated images combined with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, execute code, and integrate third-party user-defined functions.

The Gemini 2.0 Flash model offers faster response times and outperforms its predecessors on major benchmarks. Developers can access Gemini 2.0 Flash through Google AI Studio and Vertex AI, with general availability expected by January 2025.

Google has also launched the Multimodal Live API, bringing real-time audio and video input capabilities that allow developers to create dynamic, interactive applications.

Project Astra and AI Agents

Introduced at Google I/O 2024, Google Project Astra, a universal AI assistant, has received several updates. It now supports multilingual and mixed-language conversations, with an improved understanding of accents and uncommon words.

Powered by Gemini 2.0, Project Astra can also utilise Google Search, Lens, and Maps, making it a more practical assistant for daily tasks. Its memory has been enhanced, allowing up to 10 minutes of in-session recall and better personalisation through past interactions. Additionally, improved streaming and native audio processing reduce latency, enabling near-human conversational speeds.

Google has also announced an early-stage research prototype, Project Mariner, which will understand and reason based on information that can be accessed while a user navigates on a web browser.

Google says that the agent uses information it sees on the screen through a Google Chrome extension to complete related tasks. The agent will be able to read information, like text, code, images, forms and even voice-based instructions.

“Book a flight from SF to Berlin, departing on March 5 and returning on the 12. The era of being able to give a computer a fairly complex high-level task and have it go off and do a lot of the work for you is becoming a reality,” said Jeff Dean, chief scientist at Google DeepMind.

Google has also introduced Jules, a developer-focused agent, that integrates with GitHub workflows to assist with coding tasks under supervision.

Google DeepMind is working on AI agents that improve video games and navigate 3D worlds. It has partnered with game developers like Supercell to explore the future of AI-powered gaming companions. Gemini 2.0’s spatial reasoning is also being tested in robotics for practical real-world use.

Notably, it recently launched Genie 2, a large-scale foundation world model capable of generating a wide variety of playable 3D environments.

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.