Meta’s Apollo Models Set New Benchmarks for Video Understanding in AI

4 months ago 36

Published on December 18, 2024
In AI News

Apollo models excel at video tasks by addressing key challenges, including how videos are sampled, encoded, and trained.

Meta AI and Stanford have introduced Apollo, a family of video-based large multimodal models (LMMs) designed to efficiently and accurately understand video content. Apollo aims to bridge the gap between text-to-image models and video comprehension, addressing challenges posed by high computational demands and technical limitations.

Apollo models excel at video tasks by addressing key challenges, including how videos are sampled, encoded, and trained. This paper gains significance, especially in light of OpenAI co-founder Ilya Sutskevar’s recent talk on pre-training hitting a wall.

Apollo leverages scaling consistency to reduce reliance on large datasets and models while improving task-specific performance.

“We discovered scaling consistency, which allows us to design effective solutions using smaller models and datasets, reducing computational overhead,” the researchers explained in the paper.

Two key improvements make AI models better at understanding videos. First, fps sampling selects video frames at a steady rate, which works better than picking frames evenly. Second, combining SigLIP-SO400M (which focuses on clear image details) with InternVideo2 (which captures motion and timing) helps the model understand both still visuals and movements in videos.

Smaller Models with Superior Performance

Apollo-3B outperforms larger 7B models with a score of 68.7 on the MLVU benchmark. Meanwhile, Apollo-7B sets a new standard in its category, achieving 70.9 and even surpassing some 30B models.

The team also introduced ApolloBench, a faster, more efficient evaluation tool for video understanding, reducing test times by 41 times. “Our results prove that smart design and training can deliver top performance without relying on massive model sizes,” the researchers said.

Apollo marks a significant leap in video AI, opening doors to applications like content analysis and autonomous systems.

Aditi Suresh

Aditi is a political science graduate, and is interested in technology, AI, social media, and online culture.

Association of Data Scientists

GenAI Corporate Training Programs

India's Biggest Developers Summit

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

Download the easiest way to
stay informed

Roti, Kapda, aur ChatGPT

Siddharth Jindal

In today’s digital age, the classic Indian mantra of Roti, Kapda, aur Makaan (food, clothing, and shelter) is evolving—ChatGPT has now become the modern-day survival essential.

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

Rising 2025 | DE&I in Tech & AI

Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru

Data Engineering Summit 2025

May, 2025 | 📍 Bangalore, India

MachineCon GCC Summit 2025

June 2025 | 583 Park Avenue, New York

September, 2025 | 📍Bangalore, India

MachineCon GCC Summit 2025

The Most Powerful GCC Summit of the year

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.

Read Entire Article

Meta’s Apollo Models Set New Benchmarks for Video Understanding in AI

Smaller Models with Superior Performance

Aditi Suresh

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Related

AICTE Partners with IG Drones to Establish 50 Drone Centres ...

Say Hi to Doctor ChatGPT

ByteDance Releases Seedream 3.0 to Rival GPT-4o and Imagen 3...