- Published on December 18, 2024
- In AI News
Apollo models excel at video tasks by addressing key challenges, including how videos are sampled, encoded, and trained.
Meta AI and Stanford have introduced Apollo, a family of video-based large multimodal models (LMMs) designed to efficiently and accurately understand video content. Apollo aims to bridge the gap between text-to-image models and video comprehension, addressing challenges posed by high computational demands and technical limitations.
Apollo models excel at video tasks by addressing key challenges, including how videos are sampled, encoded, and trained. This paper gains significance, especially in light of OpenAI co-founder Ilya Sutskevar’s recent talk on pre-training hitting a wall.
Apollo leverages scaling consistency to reduce reliance on large datasets and models while improving task-specific performance.
“We discovered scaling consistency, which allows us to design effective solutions using smaller models and datasets, reducing computational overhead,” the researchers explained in the paper.
Two key improvements make AI models better at understanding videos. First, fps sampling selects video frames at a steady rate, which works better than picking frames evenly. Second, combining SigLIP-SO400M (which focuses on clear image details) with InternVideo2 (which captures motion and timing) helps the model understand both still visuals and movements in videos.
Smaller Models with Superior Performance
Apollo-3B outperforms larger 7B models with a score of 68.7 on the MLVU benchmark. Meanwhile, Apollo-7B sets a new standard in its category, achieving 70.9 and even surpassing some 30B models.
The team also introduced ApolloBench, a faster, more efficient evaluation tool for video understanding, reducing test times by 41 times. “Our results prove that smart design and training can deliver top performance without relying on massive model sizes,” the researchers said.
Apollo marks a significant leap in video AI, opening doors to applications like content analysis and autonomous systems.
Aditi Suresh
Aditi is a political science graduate, and is interested in technology, AI, social media, and online culture.
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
February 5 – 7, 2025 | Nimhans Convention Center, Bangalore
Rising 2025 | DE&I in Tech & AI
Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru
Data Engineering Summit 2025
May, 2025 | 📍 Bangalore, India
MachineCon GCC Summit 2025
June 2025 | 583 Park Avenue, New York
September, 2025 | 📍Bangalore, India
MachineCon GCC Summit 2025
The Most Powerful GCC Summit of the year
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.