Preference Optimization for Review Question Generation Improves Writing Quality

Karun Sharma; Vidushee Vats; Shengzhi LI; Yuxiang Wang; Zhongtian Sun; Prayag Tiwari

Preference Optimization for Review Question Generation Improves Writing Quality

Karun Sharma, Vidushee Vats, Shengzhi LI, Yuxiang Wang, Zhongtian Sun, Prayag Tiwari

Abstract

Peer review relies on substantive, evidence-based questions, yet current LLMs generate surface-level queries that perform worse than human reviewer questions in expert evaluation. To address this gap, we curate a high-quality dataset of reviewer questions from OpenReview and conduct a human preference study where expert annotators evaluate question-paper pairs across three dimensions: effort, evidence, and grounding. From these annotations, we train IntelliReward, a reward model built from a frozen autoregressive LLM with trainable multi-head transformers. Validated against expert judgments, IntelliReward predicts reviewer-question quality better than API-based SFT baselines and provides scalable evaluation. We apply Decoupled Clip and Dynamic Sampling Policy Optimization (DAPO) with IntelliReward to train IntelliAsk, a question-generation model aligned with human standards of effortful, evidence-based critique. Human evaluations show IntelliAsk generates more grounded, substantive and effortful questions than strong baselines and reduces reliance on first-page content. We also find improvements on reasoning and writing benchmarks, suggesting reviewer-question quality correlates with broader capabilities. Compared to Qwen3-32B, IntelliAsk improves MuSR (68.3 vs 64.7 Acc) and WritingBench (8.31 vs 8.07). We release our code, filtered review dataset, expert annotations, IntelliAsk and IntelliReward to support automatic evaluation of grounding, effort, and evidence in LLM-generated review questions.

Anthology ID:: 2026.findings-acl.1256
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25081–25104
Language:
URL:: https://aclanthology.org/2026.findings-acl.1256/
DOI:
Bibkey:
Cite (ACL):: Karun Sharma, Vidushee Vats, Shengzhi LI, Yuxiang Wang, Zhongtian Sun, and Prayag Tiwari. 2026. Preference Optimization for Review Question Generation Improves Writing Quality. In Findings of the Association for Computational Linguistics: ACL 2026, pages 25081–25104, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Preference Optimization for Review Question Generation Improves Writing Quality (Sharma et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1256.pdf
Checklist:: 2026.findings-acl.1256.checklist.pdf

PDF Cite Search Checklist Fix data