- Published on February 26, 2025
- In AI News
The team believes that this advancement could lead to more responsive and adaptable AI systems across various applications.

In a first-of-it’s-kind AI research paper from India, researchers from LossFunk, alongside IIT Roorkee have introduced IPO, aka implicit preference optimisation—a novel approach to align LLMs without external human feedback or reward models to obtain desired preferences.
The results: IPO performed comparably better to those utilising SOTA (state-of-the-art) reward models.
The researchers include Shivank Garg, Ayush Singh, and Shweta Singh from IIT Roorkee, along with Paras Chopra, founder of AI startup LossFunk (previously Turing’s Dream)
“LLM post-training requires a reward model for preference alignment. But is it necessary? In our new preprint, we show that the language model is itself a preference classifier & reward model isn’t needed,” said Chopra in a post on X.
In the research paper ‘IPO: Your Language Model is Secretly a Preference Classifier,’ the researchers said that their new technique offers a more efficient and scalable method for aligning LLMs with human preferences by reducing dependence on human-labelled data and external reward models. The team believes that this advancement could lead to more responsive and adaptable AI systems across various applications.
Further, they said that the conventional technique for aligning LLMs, the likes of reinforcement learning from human feedback (RLHF), depends heavily on human-generated data to train reward models that guide the models’ outputs, which is both costly and time-consuming.
In contrast, their new IPO method uses the inherent capabilities of generative LLMs to function as preference classifiers, thereby minimising the need for external feedback mechanisms.
To evaluate the effectiveness of their approach, the researchers conducted comprehensive tests using RewardBench, a benchmark designed to assess preference classification abilities across various models. They examined models of different sizes, architectures, and training levels.
We show that our method is superior to using LLM-as-judge (as in "Self rewarding" approach) as measured on Reward Bench (which has ground truth labels for what good or bad responses are)
You can clearly see that in some cases we get 90%+ accuracy on RewardBench! pic.twitter.com/vvjhZGNyYo
A significant aspect of the study involved exploring the self-improvement capabilities of LLMs. The team generated multiple responses to given instructions and employed the model as a preference classifier within a Direct Preference Optimisation (DPO) framework. This approach allowed the model to refine its outputs without external intervention.
“Our findings demonstrate that models trained through IPO achieve performance comparable to those utilising state-of-the-art reward models for obtaining preferences,” the researchers noted.
This comes amid LossFunk’s mission to build a state-of-the-art foundational reasoning model from India, with the company inviting applicants to join the effort.
At MLDS, India’s biggest summit for developers, Chopra said that for India to develop a state-of-the-art foundation model, sheer compute power might not be the most effective solution.
“The human brain is an incredibly efficient AGI. It runs on potatoes. You don’t need a nuclear-powered data centre to operate an AGI,” he said.
Comparing ISRO’s accomplishments in several missions at a lower cost than NASA’s, he added that India can do the same in AI.“As a nation, we don’t have to look too far to see the amazing things we’ve already accomplished. We’ve done it in areas like space, and there’s no reason why we can’t do the same in AI.”
“Creativity is born out of constraints, and DeepSeek’s success proves that with the right approach, it’s possible to innovate and scale AI models without relying on endless financial resources,” Chopra further said.
Chopra recently sold Wingify, his Delhi-based SaaS startup, which was acquired by private equity firm Everstone for $200 million (approximately 1,600 crore INR).
Siddharth Jindal
Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
Rising 2025 Women in Tech & AI
March 20 and 21, 2025 | 📍 NIMHANS Convention Center, Bengaluru
AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blue, Bangalore, India
Data Engineering Summit 2025
May 15-16, 2025 | 📍 Hotel Radisson Blu, Bengaluru
MachineCon GCC Summit 2025
June 20-22, 2025 | 📍 ITC Grand, Goa
Sep 17-19, 2025 | 📍KTPO, Whitefield, Bangalore, India
India's Biggest Developers Summit Feb, 2025 | 📍Nimhans Convention Center, Bangalore
Our Discord Community for AI Ecosystem.