Raj Sanjay Shah


2024

pdf bib
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing
Jiangshu Du | Yibo Wang | Wenting Zhao | Zhongfen Deng | Shuaiqi Liu | Renze Lou | Henry Peng Zou | Pranav Narayanan Venkit | Nan Zhang | Mukund Srinath | Haoran Ranran Zhang | Vipul Gupta | Yinghui Li | Tao Li | Fei Wang | Qin Liu | Tianlin Liu | Pengzhi Gao | Congying Xia | Chen Xing | Cheng Jiayang | Zhaowei Wang | Ying Su | Raj Sanjay Shah | Ruohao Guo | Jing Gu | Haoran Li | Kangda Wei | Zihao Wang | Lu Cheng | Surangika Ranathunga | Meng Fang | Jie Fu | Fei Liu | Ruihong Huang | Eduardo Blanco | Yixin Cao | Rui Zhang | Philip S. Yu | Wenpeng Yin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Claim: This work is not advocating the use of LLMs for paper (meta-)reviewing. Instead, wepresent a comparative analysis to identify and distinguish LLM activities from human activities. Two research goals: i) Enable better recognition of instances when someone implicitly uses LLMs for reviewing activities; ii) Increase community awareness that LLMs, and AI in general, are currently inadequate for performing tasks that require a high level of expertise and nuanced judgment.This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload?This study focuses on the topic of LLMs as NLP Researchers, particularly examining the effectiveness of LLMs in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with “deficiency” labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) “LLMs as Reviewers”, how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) “LLMs as Metareviewers”, how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.

pdf bib
Development of Cognitive Intelligence in Pre-trained Language Models
Raj Sanjay Shah | Khushi Bhardwaj | Sashank Varma
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Recent studies show evidence for emergent cognitive abilities in Large Pre-trained Language Models (PLMs). The increasing cognitive alignment of these models has made them candidates for cognitive science theories. Prior research into the emergent cognitive abilities of PLMs has been path-independent to model training, i.e. has only looked at the final model weights and not the intermediate steps. However, building plausible models of human cognition using PLMs also requires aligning their performance during training to the developmental trajectories of children’s thinking. Guided by psychometric tests of human intelligence, we choose four task categories to investigate the alignment of ten popular families of PLMs and evaluate each of their available intermediate and final training steps: Numerical ability, Linguistic abilities, Conceptual understanding, and Fluid reasoning. We find a striking regularity: regardless of model size, the developmental trajectories of PLMs consistently exhibit a window of maximal alignment to human cognitive development. Before that window, training appears to endow models with the requisite structure to be poised to rapidly learn from experience. After that window, training appears to serve the engineering goal of reducing loss but not the scientific goal of increasing alignment with human cognition.

2023

pdf bib
Pre-training LLMs using human-like development data corpus
Khushi Bhardwaj | Raj Sanjay Shah | Sashank Varma
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning