Google Crashes Copilot Vision, Computer Use Party with Mariner

4 months ago 40
  • Published on December 11, 2024
  • In AI News

But how will it compete with OpenAI's 'Project Operator', that is set to be released soon?

Google has announced an early-stage research prototype, Project Mariner, which will understand and reason based on information that can be accessed while a user navigates on a web browser. This feature is built on top of Google’s latest Gemini 2.0

Google also says that the agent uses information it sees on the screen through a Google Chrome extension to complete related tasks. The agent will be able to read information, like text, code, images, forms and even voice-based instructions. 

The agent is also capable of navigating and interacting with websites on the user’s behalf and automating certain tasks.

The company, in a demo video, showcased Project Mariner’s capabilities. The agent was prompted to find a painting of ‘the most famous post-impressionist’ from Google Arts and Culture and clubbed it with an unrelated task, which involved adding ‘colourful paints’ to an Etsy cart. 

Project Mariner then fed the instructions to Gemini to find the artist and the painting, fetched details, and then automatically redirected the user to Google Arts and Culture. Later, it searched for the painting on the website. For the next task, it navigated to Etsy and added a set of watercolours to the shopping cart.

During the process, Project Mariner understood the instructions and further broke them into step-by-step actionable tasks. The tool performed actions in the active tab and not through any background activity. 

Project Mariner is available through a ‘Trusted Tester Waitlist’. Along with this announcement, Google also officially unveiled the Gemini 2.0 family of models, starting with Gemini 2.0 Flash. 

Google also announced updates to Project Astra, such as better dialogue and memory capabilities and the ability to use external tools. Along with Project Mariner, Google also unveiled Jules, an AI code agent that can be directly integrated into a GitHub workflow. 

That said, Google’s agent arrived just days after Microsoft announced Copilot Vision as an experimental feature. 

Copilot Vision can read and analyse web pages and can provide relevant summaries and information to the user. However, unlike Project Mariner, Copilot Vision cannot take actions on behalf of the user. 

Therefore, Google’s only real competitor is Anthropic’s Computer Use, which not only performs autonomous actions but is also not restricted to a browser environment. Many developers are already experimenting with Computer Use, and most recently, Hume AI explored a capability that lets you control your desktop just by using your Voice. 

It will be interesting to see what OpenAI’s rumoured ‘Project Operator’ is going to look like. A few days ago, OpenAI demonstrated an agent based on GPT 4o at the GenerationAI Conference in Paris, where it assisted in customer issues. 

It is possible that OpenAI will officially announce features along these lines at the ongoing 12 Days of OpenAI events. 

Picture of Supreeth Koundinya

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

Association of Data Scientists

GenAI Corporate Training Programs

India's Biggest Developers Summit

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

Rising 2025 | DE&I in Tech & AI

Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru

Data Engineering Summit 2025

May, 2025 | 📍 Bangalore, India

MachineCon GCC Summit 2025

June 2025 | 583 Park Avenue, New York

September, 2025 | 📍Bangalore, India

MachineCon GCC Summit 2025

The Most Powerful GCC Summit of the year

discord icon

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Read Entire Article