Yoshua Bengio Proposes ‘Scientist AI’ to Mitigate Catastrophic Risks from Superintelligent Agents

1 month ago 27

Published on February 25, 2025
In AI News

Scientist AI is a “system designed to explain the world from observations, as opposed to taking actions in it to imitate or please humans”.

Turing Award winner Yoshua Benjio, along with a group of AI researchers, on Monday proposed ‘Scientist AI’. This AI system is designed to accelerate scientific progress and research while functioning as a guardrail to protect against “unsafe agentic AIs”.

The authors examined the shortcomings of building AI systems that model human cognition. They said, “Human-like agency in AI systems could reproduce and amplify harmful human tendencies, potentially with catastrophic consequences.”

They added that combining the power of AI agents (systems designed to autonomously pursue goals) with superhuman capabilities could “enable dangerous, rogue AI systems”. This led to the proposal of ‘Scientist AIs’, which can understand the world and infer based on that knowledge – instead of just pursuing the intended goals.

“In contrast to an agentic AI, which is trained to pursue a goal, a Scientist AI is trained to provide explanations for events along with their estimated probability,” said the authors.

Moreover, the system aims to avoid the risks of reinforcement learning, a training practice to maximise the long-term cumulative reward – which the authors say can “easily lead to goal misspecification and misgeneralisation”.

The proposed system is not trained to maximise rewards but to explain the world from observations instead of taking actions to imitate or please humans. Based on the knowledge of the world, the system provides reliable explanations for its outputs, and humans or another AI system can do a deep dive into why each argument is justified, analogous to a peer review.

To avoid self-fulfilling predictions, the authors said, “Predictions can be made in a conjectured setting of the simulated world in which the Scientist AI either does not exist or does not affect the rest of the world.”

Scientist AI is also said to become safer and more accurate with more compute – unlike traditional systems, which according to the authors, “tend to become more susceptible to misalignment and deceptive behaviour as they are trained with more compute”.

“We hope these arguments will motivate researchers, developers, and policymakers to favour this safer path,” said the authors.

The detailed 58-page report can be found here.

Bengio, along with Yann LeCun and Geoffrey Hinton, received the 2018 ACM AM Turing Award, often regarded as the ‘Nobel Prize for Computing’. The trio is widely recognised for their foundational work on deep learning.

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.