Haoran Ranran Zhang
2024
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing
Jiangshu Du
|
Yibo Wang
|
Wenting Zhao
|
Zhongfen Deng
|
Shuaiqi Liu
|
Renze Lou
|
Henry Peng Zou
|
Pranav Narayanan Venkit
|
Nan Zhang
|
Mukund Srinath
|
Haoran Ranran Zhang
|
Vipul Gupta
|
Yinghui Li
|
Tao Li
|
Fei Wang
|
Qin Liu
|
Tianlin Liu
|
Pengzhi Gao
|
Congying Xia
|
Chen Xing
|
Cheng Jiayang
|
Zhaowei Wang
|
Ying Su
|
Raj Sanjay Shah
|
Ruohao Guo
|
Jing Gu
|
Haoran Li
|
Kangda Wei
|
Zihao Wang
|
Lu Cheng
|
Surangika Ranathunga
|
Meng Fang
|
Jie Fu
|
Fei Liu
|
Ruihong Huang
|
Eduardo Blanco
|
Yixin Cao
|
Rui Zhang
|
Philip S. Yu
|
Wenpeng Yin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Claim: This work is not advocating the use of LLMs for paper (meta-)reviewing. Instead, wepresent a comparative analysis to identify and distinguish LLM activities from human activities. Two research goals: i) Enable better recognition of instances when someone implicitly uses LLMs for reviewing activities; ii) Increase community awareness that LLMs, and AI in general, are currently inadequate for performing tasks that require a high level of expertise and nuanced judgment.This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload?This study focuses on the topic of LLMs as NLP Researchers, particularly examining the effectiveness of LLMs in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with “deficiency” labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) “LLMs as Reviewers”, how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) “LLMs as Metareviewers”, how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.
Search
Co-authors
- Chen Xing 1
- Cheng Jiayang 1
- Congying Xia 1
- Eduardo Blanco 1
- Fei Liu 1
- show all...
- Fei Wang 1
- Haoran Li 1
- Henry Peng Zou 1
- Jiangshu Du 1
- Jie Fu 1
- Jing Gu 1
- Kangda Wei 1
- Lu Cheng 1
- Meng Fang 1
- Mukund Srinath 1
- Nan Zhang 1
- Pengzhi Gao 1
- Philip S. Yu 1
- Pranav Narayanan Venkit 1
- Qin Liu 1
- Raj Sanjay Shah 1
- Renze Lou 1
- Rui Zhang 1
- Ruihong Huang 1
- Ruohao Guo 1
- Shuaiqi Liu 1
- Surangika Ranathunga 1
- Tao Li 1
- Tianlin Liu 1
- Vipul Gupta 1
- Wenpeng Yin 1
- Wenting Zhao 1
- Yibo Wang 1
- Ying Su 1
- Yinghui Li 1
- Yixin Cao 1
- Zhaowei Wang 1
- Zhongfen Deng 1
- Zihao Wang 1