DeepSeek to Release Open-Source Model With Enhanced Reward Modeling Techniques

1 week ago 8

Published on April 9, 2025
In AI News

DeepSeek-GRM is a 27B parameter model based on Gemma-2 27B.

Illustration by Supreeth Koundinya

DeepSeek AI, in collaboration with Tsinghua University, unveiled a new research study to improve reward modelling in large language models with more inference time compute. The research led to a model named DeepSeek-GRM, which the company claims will be released as open source.

The authors propose a novel method called Self-Principled Critique Tuning (SPCT) to develop scalable reward generation behaviours in generative reward models (GRMs).

Simply put, this method teaches AI models to develop their own guiding principles and critiques as they process information and reason. This enhances the effectiveness of self-evaluation across various types of tasks.

The DeepSeek-GRM is a 27-billion-parameter AI model post-trained on SPCT, based on Google’s open-source Gemma-2-27B model. To further increase efficiency, the research proposes running multiple samples, or responses simultaneously, utilising more computing power.

The DeepSeek-GRM-27B consistently scored strong results across diverse reward modeling benchmarks. The research paper discusses the benchmark scores and the techniques used in the methodology in depth.

A few weeks ago, DeepSeek released an update to its DeepSeek-V3 model. The updated model ‘DeepSeek V3-0324’ currently ranks highest in benchmarks among all non-reasoning models.

Artificial Analysis, a platform that benchmarks AI models, stated, “This is the first time an open weights model is the leading non-reasoning model, marking a milestone for open source.” The model scored the highest points among all non-reasoning models on the platform’s ‘Intelligence Index’.

Recently, Reuters reported that DeepSeek plans to release R2 “as early as possible”. The company initially intended to launch it in early May but is now contemplating an earlier timeline.

The model is expected to produce “better coding” and can reason in languages beyond English.

The DeepSeek-R2 will be the successor to the DeepSeek-R1 reasoning model, which created quite a storm in both the AI ecosystem and the markets.

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

Our Upcoming Conference

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed