- Published on April 9, 2025
- In AI News
DeepSeek-GRM is a 27B parameter model based on Gemma-2 27B.

Illustration by Supreeth Koundinya
DeepSeek AI, in collaboration with Tsinghua University, unveiled a new research study to improve reward modelling in large language models with more inference time compute. The research led to a model named DeepSeek-GRM, which the company claims will be released as open source.
The authors propose a novel method called Self-Principled Critique Tuning (SPCT) to develop scalable reward generation behaviours in generative reward models (GRMs).
Simply put, this method teaches AI models to develop their own guiding principles and critiques as they process information and reason. This enhances the effectiveness of self-evaluation across various types of tasks.
The DeepSeek-GRM is a 27-billion-parameter AI model post-trained on SPCT, based on Google’s open-source Gemma-2-27B model. To further increase efficiency, the research proposes running multiple samples, or responses simultaneously, utilising more computing power.
The DeepSeek-GRM-27B consistently scored strong results across diverse reward modeling benchmarks. The research paper discusses the benchmark scores and the techniques used in the methodology in depth.
A few weeks ago, DeepSeek released an update to its DeepSeek-V3 model. The updated model ‘DeepSeek V3-0324’ currently ranks highest in benchmarks among all non-reasoning models.
Artificial Analysis, a platform that benchmarks AI models, stated, “This is the first time an open weights model is the leading non-reasoning model, marking a milestone for open source.” The model scored the highest points among all non-reasoning models on the platform’s ‘Intelligence Index’.
Recently, Reuters reported that DeepSeek plans to release R2 “as early as possible”. The company initially intended to launch it in early May but is now contemplating an earlier timeline.
The model is expected to produce “better coding” and can reason in languages beyond English.
The DeepSeek-R2 will be the successor to the DeepSeek-R1 reasoning model, which created quite a storm in both the AI ecosystem and the markets.
Supreeth Koundinya
Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.
Related Posts
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
Happy Llama 2025
AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru, India
Data Engineering Summit 2025
May 15 - 16, 2025 | 📍 Hotel Radisson Blu, Bengaluru
MachineCon GCC Summit 2025
June 20 to 22, 2025 | 📍 ITC Grand, Goa
Cypher India 2025
Sep 17 to 19, 2025 | 📍KTPO, Whitefield, Bengaluru, India
MLDS 2026
India's Biggest Developers Summit | 📍Nimhans Convention Center, Bengaluru
Rising 2026
India's Biggest Summit on Women in Tech & AI 📍 Bengaluru