Groq Unveils LLaVA V1.5 7B, Faster than OpenAI GPT-4o

7 months ago 130

Last updated September 5, 2024
In AI News

Developers and businesses can LLaVA v1.5 7B in Preview Mode on GroqCloud.

Illustration by Raghavendra Rao

Groq has introduced LLaVA v1.5 7B, a new visual model now available on its Developer Console. This launch makes GroqCloud multimodal and broadens its support to include image, audio, and text modalities.

🚨 New *multi-modal* model dropped on @Groqinc! Llava v1.5 7b is a visual model that can take images as input.

⚡️Try it now in API or console as “llava-v1.5-7b-4096-preview”!

Developers can now build applications on Groq with all three modalities: image, audio, and text! pic.twitter.com/px90CVtPLq

— Benjamin Klieger (@BenjaminKlieger) September 4, 2024

LLaVA, short for Large Language and Vision Assistant, combines language and vision capabilities. It builds on OpenAI’s CLIP and Meta’s Llama 2 7B model, utilising visual instruction tuning to enhance image-based natural instruction following and visual reasoning.

This enables LLaVA to excel in tasks such as visual question answering, caption generation, optical character recognition, and multimodal dialogue.

“LLaVA-v1.5-7B which supports vision/image inputs and in our initial benchmarking response times were >4X faster than GPT-4o on OpenAI,” said Artificial Analysis.

Groq has launched their first multi-modal endpoint! Groq is hosting LLaVA-v1.5-7B which supports vision/image inputs and in our initial benchmarking response times were >4X faster than GPT-4o on OpenAI.

We have conducted initial benchmarking comparing the response speed of… pic.twitter.com/bHFDSeVPaZ

— Artificial Analysis (@ArtificialAnlys) September 4, 2024

The new model unlocks numerous practical applications. Retailers can use it for inventory tracking, social media platforms can improve accessibility with image descriptions, and customer service chatbots can handle text and image-based interactions.

Additionally, it aids in automating tasks in industries such as manufacturing, finance, retail, and education by streamlining processes and enhancing efficiency.

Developers and businesses can LLaVA v1.5 7B in Preview Mode on GroqCloud.

Groq recently partnered with Meta, making the latest Llama 3.1 models—including 405B Instruct, 70B Instruct, and 8B Instruct—available to the community at Groq speed.

Former OpenAI researcher Andrej Karpathy praised Groq’s inference speed, saying, “This is so cool. It feels like AGI—you just talk to your computer and it does stuff instantly. Speed really makes AI so much more pleasing.”

Founded in 2016 by Ross, Groq distinguishes itself by eschewing GPUs in favour of its proprietary hardware, the LPU.

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.