New Open-Source Champion Reflection 70B Outperforms GPT-4o and Claude Sonnet 3.5

7 months ago 92

Last updated September 6, 2024
In AI News

The Reflection-Tuning technique allows Reflection 70B to detect and correct its own mistakes before finalising an answer.

Illustration by Illustrated by Nikhil Kumar

Matt Shumer, co-founder and CEO of AI writing startup HyperWrite recently launched a new model called Reflection 70B.

I'm excited to announce Reflection 70B, the world’s top open-source model.

Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.

405B coming next week – we expect it to be the best model in the world.

Built w/ @GlaiveAI.

Read on ⬇️: pic.twitter.com/kZPW1plJuo

— Matt Shumer (@mattshumer_) September 5, 2024

The model has emerged as a leading open-source language model, outperforming top closed-source models like OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5. The model, developed using a novel technique called Reflection-Tuning, showcases significant improvements in benchmark tests, including MMLU, MATH, IFEval, and GSM8K.

The Reflection-Tuning technique allows Reflection 70B to detect and correct its own mistakes before finalising an answer. This advancement aims to address the common issue of model hallucinations and improve reasoning accuracy.

The model outputs its internal reasoning in <thinking> tags and final answers in <output> tags, with additional <reflection> tags used for correcting any detected errors.

Currently, Reflection 70B holds the top position in several benchmarks and demonstrates superior performance over GPT-4o and Llama 3.1 405B. The upcoming Reflection 405B model, expected next week, is anticipated to further elevate the standard for LLMs globally.

This is second model this week outperforming GPT-4o and Claude Sonnet 3.5

Alibaba recently released Qwen2-VL, the latest model in its vision-language series. The new model can chat via camera, play card games, and control mobile phones and robots by acting as an agent. It is available in three versions: the open source 2 billon and 7 billion models, and the more advanced 72 billion model, accessible using API.

The advanced 72 billion model of Qwen2-VL achieved SOTA visual understanding across 20 benchmarks. “Overall, our 72B model showcases top-tier performance across most metrics, often surpassing even closed-source models like GPT-4o and Claude 3.5-Sonnet,”said the company in a blog post, saying that it demonstrates a significant edge in document understanding.

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.