Microsoft’s Copilot Vision Learns to See, Shop, and Scroll Like You

4 months ago 24

Last updated December 5, 2024
In AI News

Similar to Copilot Vision, OpenAI is set to launch its agent, Operator, in January.

Microsoft Copilot for Security to be Generally Available Soon

Microsoft has launched Copilot Vision, an experimental feature integrated into the Edge browser, enabling AI-assisted browsing by analysing web pages in real time. The feature is available in preview for select Copilot Pro subscribers in the United States through Copilot Labs, the company announced on its blog.

Maybe you shouldn't play favorites with features, but Copilot Vision is an update I've been excited about since day one. It's the first AI experience of its kind—now you can show, not tell, AI what you want help with in real time as you scroll, shop, guess locations…🌏 pic.twitter.com/DJNmgMuw7D

— Mustafa Suleyman (@mustafasuleyman) December 5, 2024

The opt-in service allows Copilot Vision to read and analyse web pages with user permission, providing insights, simplifying information, and assisting with tasks such as holiday shopping or planning outings. “Browsing no longer needs to be a lonely experience with just you and all your tabs,” the Copilot Team stated in the announcement.

Vision provides support by scanning and interpreting the page’s content, helping users make decisions or learn from the information presented. For instance, Vision can guide users through learning new games or finding specific products that match their preferences during online shopping.

Microsoft emphasised privacy and security in the development of Vision, ensuring that user data is not stored after a session ends and is handled in line with the company’s privacy policies. The company said, “Only Copilot’s responses are logged to improve our safety systems.”

Currently, Vision is limited to interacting with a select number of websites. Microsoft plans to expand its availability gradually, gathering user feedback to refine the experience. The company is also collaborating with third-party publishers to enhance how Vision interacts with web pages.

“Vision does not capture, store, or use any data from publishers to train our models,” the blog post added.

Similar to Copilot Vision, OpenAI is set to launch its agent, Operator, in January.

Meanwhile, Google is working on an experimental AI assistant called “Jarvis,” likely to be powered by the Gemini 2.0. Jarvis operates within Chrome and interacts with on-screen elements like fields and buttons. It handles complex tasks, such as booking flights and assisting with online shopping, simplifying digital activities for users.

Similarly, Anthropic has launched the ‘computer use’ feature for the Claude 3.5 Sonnet version, which allows the AI to autonomously perform tasks like moving the mouse, clicking, and typing. Targeted at software developers, it can handle complex activities such as coding a basic website or planning outings across different applications.

[This story has been read by 17 unique individuals.]

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.