Groq Unveils LLaVA V1.5 7B, Faster than OpenAI GPT-4o

7 months ago 130
  • Last updated September 5, 2024
  • In AI News

Developers and businesses can LLaVA v1.5 7B in Preview Mode on GroqCloud. 

Illustration by Raghavendra Rao

Groq has introduced LLaVA v1.5 7B, a new visual model now available on its Developer Console. This launch makes GroqCloud multimodal and broadens its support to include image, audio, and text modalities.

🚨 New *multi-modal* model dropped on @Groqinc! Llava v1.5 7b is a visual model that can take images as input.

⚡️Try it now in API or console as “llava-v1.5-7b-4096-preview”!

Developers can now build applications on Groq with all three modalities: image, audio, and text! pic.twitter.com/px90CVtPLq

— Benjamin Klieger (@BenjaminKlieger) September 4, 2024

LLaVA, short for Large Language and Vision Assistant, combines language and vision capabilities. It builds on OpenAI’s CLIP and Meta’s Llama 2 7B model, utilising visual instruction tuning to enhance image-based natural instruction following and visual reasoning. 

This enables LLaVA to excel in tasks such as visual question answering, caption generation, optical character recognition, and multimodal dialogue.

“LLaVA-v1.5-7B which supports vision/image inputs and in our initial benchmarking response times were >4X faster than GPT-4o on OpenAI,” said Artificial Analysis.

Groq has launched their first multi-modal endpoint! Groq is hosting LLaVA-v1.5-7B which supports vision/image inputs and in our initial benchmarking response times were >4X faster than GPT-4o on OpenAI.

We have conducted initial benchmarking comparing the response speed of… pic.twitter.com/bHFDSeVPaZ

— Artificial Analysis (@ArtificialAnlys) September 4, 2024

The new model unlocks numerous practical applications. Retailers can use it for inventory tracking, social media platforms can improve accessibility with image descriptions, and customer service chatbots can handle text and image-based interactions. 

Additionally, it aids in automating tasks in industries such as manufacturing, finance, retail, and education by streamlining processes and enhancing efficiency.

Developers and businesses can LLaVA v1.5 7B in Preview Mode on GroqCloud. 

Groq recently partnered with Meta, making the latest Llama 3.1 models—including 405B Instruct, 70B Instruct, and 8B Instruct—available to the community at Groq speed.

Former OpenAI researcher Andrej Karpathy praised Groq’s inference speed, saying, “This is so cool. It feels like AGI—you just talk to your computer and it does stuff instantly. Speed really makes AI so much more pleasing.”

Founded in 2016 by Ross, Groq distinguishes itself by eschewing GPUs in favour of its proprietary hardware, the LPU.

Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.

Association of Data Scientists

Tailored Generative AI Training for Your Team

Upcoming Large format Conference

Sep 25-27, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

26 July 2024 | 583 Park Avenue, New York

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

September 25-27, 2024 | 📍Bangalore, India

discord icon

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Read Entire Article