Realtime AI News

LLM-Based Peer Review Survey: Fluency Is Not Enough, Reliability Remains a Challenge

A comprehensive survey finds that while LLMs can generate fluent peer review critiques, their reliability, robustness, and security as decision-support systems remain poorly understood.

PublishedJun 25, 2026, 12:00 Beijing time/Reads 0

A new survey published on arXiv systematically reviews the state of LLM-assisted scientific peer review. The rapid growth of scientific submissions has pushed traditional peer review toward its scalability limits, motivating exploration of LLMs as automated evaluation assistants.

The survey covers existing methods, benchmarks, and reliability challenges, finding that while LLMs can produce fluent critiques and approximate reviewer scores, critical issues around reliability, robustness, and security remain insufficiently addressed. This is particularly concerning for a domain where errors can affect research funding and career decisions.

Published on arXiv cs.CL on June 25, 2026, this survey provides a system-level perspective on the field, serving as a roadmap for researchers working on AI-assisted peer review.

Why it matters

Offers a comprehensive systems-level roadmap of LLM peer review research while highlighting critical reliability gaps.

arXivSurveyPeer ReviewReliability

Sources

Source 1: https://arxiv.org/abs/2606.25057