Cerebras CePo Brings Test Time Computation to Llama

4 months ago 22
  • Last updated December 11, 2024
  • In AI News

Meta’s Llama 3.3 70B now outperforms Llama 405B, thanks to test time computation. 

Cerebras, the AI hardware and inference solution provider, announced a new technique called CePO – Cerebras Planning and Optimization that ‘drastically’ improves the reasoning capabilities of Meta’s Llama models. 

Cerbras uses the much-coveted test time computation technique on the Llama 3.3 70B model, outperforming the Llama 3.1 405B model across several benchmarks while ‘maintaining interactive speeds of 100 tokens per second’. Cerebras has also unveiled a detailed technical documentation, outlining the capabilities of CePO. 

“While models like OpenAI o1 and Alibaba QwQ have demonstrated the power of additional computation at inference time, CePO brings these capabilities to Llama – the world’s most popular open-source LLM family,” said Cerebras in the announcement. 

Cerebras also compared its technique with GPT-4 Turbo and Claude 3.5 Sonnet, and it achieved ‘comparable performance’ in most benchmarks. However, there isn’t any comparison being made among the industry-leading reasoning model – OpenAI’s o1. 

For example, the Llama 3.3 70B model scored 53.3% on the GPQA benchmark, whereas the o1 model scored a higher 76%. While OpenAI hasn’t revealed the number of parameters in the o1 model, it surely has, and significantly more than, 70B parameters. 

“By bringing these capabilities to the Llama family of models, we’re democratizing access to sophisticated reasoning techniques previously limited to closed commercial systems,” said Andrew Feldman, CEO and Co-founder of Cerebras Systems. 

Cerebras is also going to open-source the CePO framework. The company also aims to develop more ‘advanced prompting frameworks that leverage comparative reasoning’ and synthetic datasets that are optimised for inference time computing. 

Cerebras is using the latest edition of Meta’s Llama, the Llama 3.3. Meta announced the model only a few days ago. According to Meta, the model delivers ‘leading performance’ in synthetic data generation, and the model also supports an expanded context length of 128k tokens. 

A few days ago, Meta also unveiled a new ‘Chain of Continuous Thought’ technique, or COCONUT, that overcomes the limitations of the Chain of Thought, or CoT technique, where the explicit reasoning process is generated in natural language tokens. 

Instead of making the model convert its internal thinking into words after each step, COCONUT uses its internal thinking as a starting point for the subsequent step. 

Reasoning models are the next big thing in the ecosystem today. While OpenAI just unveiled the full version of the o1 model, they also have strong competition from the East. China’s DeepSeek R1 Lite supposedly offers better reasoning capability versus the o1 and is also available as an open-source model. 

[This story has been read by 5 unique individuals.]

Picture of Supreeth Koundinya

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

Association of Data Scientists

GenAI Corporate Training Programs

India's Biggest Developers Summit

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

Rising 2025 | DE&I in Tech & AI

Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru

Data Engineering Summit 2025

May, 2025 | 📍 Bangalore, India

MachineCon GCC Summit 2025

June 2025 | 583 Park Avenue, New York

September, 2025 | 📍Bangalore, India

MachineCon GCC Summit 2025

The Most Powerful GCC Summit of the year

discord icon

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Read Entire Article