DeepSeek to Release Open-Source Model With Enhanced Reward Modeling Techniques

1 week ago 8
  • Published on April 9, 2025
  • In AI News

DeepSeek-GRM is a 27B parameter model based on Gemma-2 27B. 

Illustration by Supreeth Koundinya

DeepSeek AI, in collaboration with Tsinghua University, unveiled a new research study to improve reward modelling in large language models with more inference time compute. The research led to a model named DeepSeek-GRM, which the company claims will be released as open source. 

happyllama

The authors propose a novel method called Self-Principled Critique Tuning (SPCT) to develop scalable reward generation behaviours in generative reward models (GRMs).

Simply put, this method teaches AI models to develop their own guiding principles and critiques as they process information and reason. This enhances the effectiveness of self-evaluation across various types of tasks. 

The DeepSeek-GRM is a 27-billion-parameter AI model post-trained on SPCT, based on Google’s open-source Gemma-2-27B model. To further increase efficiency, the research proposes running multiple samples, or responses simultaneously, utilising more computing power. 

The DeepSeek-GRM-27B consistently scored strong results across diverse reward modeling benchmarks. The research paper discusses the benchmark scores and the techniques used in the methodology in depth. 

A few weeks ago, DeepSeek released an update to its DeepSeek-V3 model. The updated model ‘DeepSeek V3-0324’ currently ranks highest in benchmarks among all non-reasoning models. 

Artificial Analysis, a platform that benchmarks AI models, stated, “This is the first time an open weights model is the leading non-reasoning model, marking a milestone for open source.” The model scored the highest points among all non-reasoning models on the platform’s ‘Intelligence Index’. 

Recently, Reuters reported that DeepSeek plans to release R2 “as early as possible”. The company initially intended to launch it in early May but is now contemplating an earlier timeline. 

The model is expected to produce “better coding” and can reason in languages beyond English.

The DeepSeek-R2 will be the successor to the DeepSeek-R1 reasoning model, which created quite a storm in both the AI ecosystem and the markets. 

Picture of Supreeth Koundinya

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

Related Posts

Our Upcoming Conference

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Happy Llama 2025

AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru, India

Data Engineering Summit 2025

May 15 - 16, 2025 | 📍 Hotel Radisson Blu, Bengaluru

MachineCon GCC Summit 2025

June 20 to 22, 2025 | 📍 ITC Grand, Goa

Cypher India 2025

Sep 17 to 19, 2025 | 📍KTPO, Whitefield, Bengaluru, India

MLDS 2026

India's Biggest Developers Summit | 📍Nimhans Convention Center, Bengaluru

Rising 2026

India's Biggest Summit on Women in Tech & AI 📍 Bengaluru

Read Entire Article