‘AI Can’t Replace a Human Code Reviewer’

3 weeks ago 13

From vibe coding to vibe debugging, AI coding has taken the internet by storm for reasons good and bad. Dario Amodei, CEO of Anthropic, recently predicted that AI would write 90% of code in 3–6 months, sending human programmers into a tizzy.

However, there have been pacifying voices too. “It needs to be said: AI can’t fully replace human code review,” began Greg Foster in a company blog post. Foster is the co-founder and CTO at Graphite.dev, an AI code review platform.

The Differences in Code Generation and Code Review

Foster explained a fundamental difference between code creation and code review. Code creation can be evaluated quickly when AI is asked to generate a function, spin a web page, and more. It is also easy to verify, making vibe coding a thing now. However, he believes when it comes to code review, the LLM will do an ‘okay’ job.

He shared the example of a GitHub pull request, which he evaluated with the help of ChatGPT. Foster found that the LLM highlighted some good parts, flagged some issues, and added general suggestions to help improve the code.

This evaluation by an AI tool should not be the sole reason for shipping the code to production but only a first pass to speed things up.

Foster shared his experiments at Graphite over the years, where they tested the changes in context window size, tool usage, and false positive calibration of AI-based solutions. Referencing the same, he emphasised that no amount of progress should let anyone leave the final call to LLMs for code review.

“Engineers are so much more than just code-machines. The more AI writes the code, the more valuable it’ll be to have expert engineers reviewing it, deploying it, and iterating on it,” Foster added.

The Role of Context in Code

Foster highlighted that LLMs are as good as the data and references fed to them. A good AI code reviewer might have the PR title, description, diff, access to the full codebase, links to historical PRs, comments, the ability to go through Google Docs, Slack, and Notion for design specs, and a web search for library documentation.

He explained that even with all the abilities, AI will not know everything, such as your product roadmap being shifted after a meeting with the customer, subjective bias, and other strategic factors.

He emphasised that the human and machine context combined is always greater than that of the machine alone.

Biswajeet Parija, data scientist at Bristol Myers Squibb, told AIM, “AI struggles to grasp the nuanced context of a project, including its specific goals, architectural design, and business logic. It might flag code as problematic based on general rules, overlooking valid implementations within a specific context.”

To get more perspective, AIM spoke to Sulaiman Mudimala, founder of Bezu AI. “While AI tools excel at detecting syntax errors or patterns, human code review remains irreplaceable,” he said.

AI Doesn’t Exhibit Learning, Collaboration, and Accountability

Foster said that code review is not just for correcting codes; it’s also a medium to teach new hires about better coding standards, cultural norms, and best practices. However, he added that AI would not jump on to a video call to discuss an alternative approach or wait for someone’s insights—that kind of discussion is only possible with humans who care about the codebase.

Besides, if one hands over the keys to AI, the chain of responsibility is lost, and no one can be held accountable for errors or bad security practices any more.

“Humans bring critical thinking, ethical judgment, and the ability to navigate ambiguous requirements—elements that AI lacks. Code isn’t just about functionality; it’s about collaboration, creativity, and aligning with real-world impact, which requires human intuition,” Mudimala told AIM.

Parija believes, “Developers must retain the final say in code change, and AI should serve as a tool to augment, not replace human judgment.” He shared the example of radar systems and said that AI might miss subtle patterns, which often require the cognitive flexibility and contextual awareness of a human to make judgments.

Dipanjan Dey, co-founder and CEO of Kombai, an AI developer tool, told AIM, “Real-world code reviews require deep understanding of complex contexts and nuanced judgment about tradeoffs. These qualities remain uniquely human.”

“Most teams also want a human to be accountable for the final review before the code goes to production. AI tools can, however, provide valuable help to human code reviewers, much like with other development tasks.” he added.

Read Entire Article

‘AI Can’t Replace a Human Code Reviewer’

The Differences in Code Generation and Code Review

The Role of Context in Code

AI Doesn’t Exhibit Learning, Collaboration, and Accountability

Related

The State of Reinforcement Learning for LLM Reasoning

GPT-4o makes beautiful images but fails basic reasoning test...

Researchers introduce COLORBENCH to test color understanding...