Meta Drops COCONUT, Breaks Chain of Thought Reasoning 

4 months ago 37
  • Last updated December 10, 2024
  • In AI News

Is it as good as OpenAI's o1, though?

Meta LLama 4

Meta’s FAIR (The Fundamental AI Research) on Monday unveiled a new research study that explores a ‘Chain of Continuous Thought’ technique, or COCONUT. 

This overcomes the limitations of the Chain of Thought, or CoT technique, where the explicit reasoning process is generated in natural language tokens. 

“Chain-of-thought (CoT) reasoning involves prompting or training LLMs to generate solutions step-by-step using natural language. However, this is in stark contrast to certain human cognition results,” said the researchers. 

Meta provides an analogy and cites neuroimaging studies which show that ‘language network – a set of brain regions responsible for language comprehension and production – remain largely inactive during various reasoning tasks.’ 

This gives rise to an issue where the amount of reasoning needed for each particular token varies depending on the complexity of the problem. Still, LLMs allocate ‘nearly the same computing budget for predicting every token’.

Meta explores reasoning in an abstract manner, which involves modifying the CoT process. Instead of making the model convert its internal thinking into words after each step, COCONUT uses its internal thinking as a starting point for the subsequent step. 

“This modification frees the reasoning from being within the language space, and the system

can be optimised end-to-end by gradient descent, as continuous thoughts are fully differentiable,” mentioned the authors. 

“Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference,” they added. 

A few days ago, OpenAI released the full version of the o1 model. A user on Reddit compared the $200 o1 Pro model against Claude 3.5 and said that the former is marginally better at reasoning and excels at PhD level questions. 

OpenAI uses a combination of CoT and Reinforcement Learning techniques to help the model reason. 

Debarghya Das, a VC at Menlo Ventures, said on X “​​Two weeks ago, research said no LLM could solve NYT Connections — a simple game where you group 16 words into 4 groups of 4.”

“o1-pro solves it consistently in one shot.”

Meta has also had its fair share of action over the last few days. Meta unveiled Llama 3.3 with 70 billion parameters, which is said to be as good as their flagship 405B parameter model, yet optimised for efficiency. 

“As we continue to explore new post-training techniques, today we’re releasing Llama 3.3 — a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost,” said Meta on a post in X.

So, what’s next for Meta? It is indeed the fourth iteration of the beloved Llama. 

Mark Zuckerberg, CEO at Meta, in their Q3 24 earnings call, said, “I expect that the smaller Llama 4 models will be ready first, and we expect [them] sometime early next year, and I think that they’re going to be a big deal on several fronts — new modalities, capabilities, stronger reasoning, and much faster,” confirming the release of Llama 4 next year. 

Ahmad Al-Dahle, VP of GenAI at Meta, said in a post on X, “Great to visit one of our data centres where we’re training Llama 4 models on a cluster bigger than 100K H100s! So proud of the incredible work we’re doing to advance our products, the AI field and the open-source community.”

[This story has been read by 11 unique individuals.]

Picture of Supreeth Koundinya

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

Association of Data Scientists

GenAI Corporate Training Programs

India's Biggest Developers Summit

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

Download the easiest way to
stay informed

‘Stop Using AI Everywhere’

Shalini Mondal

Global professional services giant EY believes it’s not about using AI everywhere but applying it where it provides maximum value.

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

Rising 2025 | DE&I in Tech & AI

Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru

Data Engineering Summit 2025

May, 2025 | 📍 Bangalore, India

MachineCon GCC Summit 2025

June 2025 | 583 Park Avenue, New York

September, 2025 | 📍Bangalore, India

MachineCon GCC Summit 2025

The Most Powerful GCC Summit of the year

discord icon

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Read Entire Article