2024
pdf
bib
abs
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing
Jiangshu Du
|
Yibo Wang
|
Wenting Zhao
|
Zhongfen Deng
|
Shuaiqi Liu
|
Renze Lou
|
Henry Peng Zou
|
Pranav Narayanan Venkit
|
Nan Zhang
|
Mukund Srinath
|
Haoran Ranran Zhang
|
Vipul Gupta
|
Yinghui Li
|
Tao Li
|
Fei Wang
|
Qin Liu
|
Tianlin Liu
|
Pengzhi Gao
|
Congying Xia
|
Chen Xing
|
Cheng Jiayang
|
Zhaowei Wang
|
Ying Su
|
Raj Sanjay Shah
|
Ruohao Guo
|
Jing Gu
|
Haoran Li
|
Kangda Wei
|
Zihao Wang
|
Lu Cheng
|
Surangika Ranathunga
|
Meng Fang
|
Jie Fu
|
Fei Liu
|
Ruihong Huang
|
Eduardo Blanco
|
Yixin Cao
|
Rui Zhang
|
Philip S. Yu
|
Wenpeng Yin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Claim: This work is not advocating the use of LLMs for paper (meta-)reviewing. Instead, wepresent a comparative analysis to identify and distinguish LLM activities from human activities. Two research goals: i) Enable better recognition of instances when someone implicitly uses LLMs for reviewing activities; ii) Increase community awareness that LLMs, and AI in general, are currently inadequate for performing tasks that require a high level of expertise and nuanced judgment.This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload?This study focuses on the topic of LLMs as NLP Researchers, particularly examining the effectiveness of LLMs in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with “deficiency” labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) “LLMs as Reviewers”, how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) “LLMs as Metareviewers”, how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.
pdf
bib
abs
An Audit on the Perspectives and Challenges of Hallucinations in NLP
Pranav Narayanan Venkit
|
Tatiana Chakravorti
|
Vipul Gupta
|
Heidi Biggs
|
Mukund Srinath
|
Koustava Goswami
|
Sarah Rajtmajer
|
Shomir Wilson
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We audit how hallucination in large language models (LLMs) is characterized in peer-reviewed literature, using a critical examination of 103 publications across NLP research. Through the examination of the literature, we identify a lack of agreement with the term ‘hallucination’ in the field of NLP. Additionally, to compliment our audit, we conduct a survey with 171 practitioners from the field of NLP and AI to capture varying perspectives on hallucination. Our analysis calls for the necessity of explicit definitions and frameworks outlining hallucination within NLP, highlighting potential challenges, and our survey inputs provide a thematic understanding of the influence and ramifications of hallucination in society.
pdf
bib
abs
Blind Spots and Biases: Exploring the Role of Annotator Cognitive Biases in NLP
Sanjana Gautam
|
Mukund Srinath
Proceedings of the Third Workshop on Bridging Human--Computer Interaction and Natural Language Processing
With the rapid proliferation of artificial intelligence, there is growing concern over its potential to exacerbate existing biases and societal disparities and introduce novel ones. This issue has prompted widespread attention from academia, policymakers, industry, and civil society. While evidence suggests that integrating human perspectives can mitigate bias-related issues in AI systems, it also introduces challenges associated with cognitive biases inherent in human decision-making. Our research focuses on reviewing existing methodologies and ongoing investigations aimed at understanding annotation attributes that contribute to bias.
pdf
bib
abs
Automated Detection and Analysis of Data Practices Using A Real-World Corpus
Mukund Srinath
|
Pranav Narayanan Venkit
|
Maria Badillo
|
Florian Schaub
|
C. Giles
|
Shomir Wilson
Findings of the Association for Computational Linguistics: ACL 2024
Privacy policies are crucial for informing users about data practices, yet their length and complexity often deter users from reading them. In this paper, we propose an automated approach to identify and visualize data practices within privacy policies at different levels of detail. Leveraging crowd-sourced annotations from the ToS;DR platform, we experiment with various methods to match policy excerpts with predefined data practice descriptions. We further conduct a case study to evaluate our approach on a real-world policy, demonstrating its effectiveness in simplifying complex policies. Experiments show that our approach accurately matches data practice descriptions with policy excerpts, facilitating the presentation of simplified privacy information to users.
2023
pdf
bib
abs
The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis
Pranav Venkit
|
Mukund Srinath
|
Sanjana Gautam
|
Saranya Venkatraman
|
Vipul Gupta
|
Rebecca Passonneau
|
Shomir Wilson
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
We conduct an inquiry into the sociotechnical aspects of sentiment analysis (SA) by critically examining 189 peer-reviewed papers on their applications, models, and datasets. Our investigation stems from the recognition that SA has become an integral component of diverse sociotechnical systems, exerting influence on both social and technical users. By delving into sociological and technological literature on sentiment, we unveil distinct conceptualizations of this term in domains such as finance, government, and medicine. Our study exposes a lack of explicit definitions and frameworks for characterizing sentiment, resulting in potential challenges and biases. To tackle this issue, we propose an ethics sheet encompassing critical inquiries to guide practitioners in ensuring equitable utilization of SA. Our findings underscore the significance of adopting an interdisciplinary approach to defining sentiment in SA and offer a pragmatic solution for its implementation.
pdf
bib
abs
Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models
Pranav Narayanan Venkit
|
Mukund Srinath
|
Shomir Wilson
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Reddit, in order to gain insight into how disability bias is disseminated in real-world social settings. We then create the Bias Identification Test in Sentiment (BITS) corpus to quantify explicit disability bias in any sentiment analysis and toxicity detection models. Our study utilizes BITS to uncover significant biases in four open AIaaS (AI as a Service) sentiment analysis tools, namely TextBlob, VADER, Google Cloud Natural Language API, DistilBERT and two toxicity detection models, namely two versions of Toxic-BERT. Our findings indicate that all of these models exhibit statistically significant explicit bias against PWD.
2022
pdf
bib
abs
A Study of Implicit Bias in Pretrained Language Models against People with Disabilities
Pranav Narayanan Venkit
|
Mukund Srinath
|
Shomir Wilson
Proceedings of the 29th International Conference on Computational Linguistics
Pretrained language models (PLMs) have been shown to exhibit sociodemographic biases, such as against gender and race, raising concerns of downstream biases in language technologies. However, PLMs’ biases against people with disabilities (PWDs) have received little attention, in spite of their potential to cause similar harms. Using perturbation sensitivity analysis, we test an assortment of popular word embedding-based and transformer-based PLMs and show significant biases against PWDs in all of them. The results demonstrate how models trained on large corpora widely favor ableist language.
2021
pdf
bib
abs
Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies
Mukund Srinath
|
Shomir Wilson
|
C Lee Giles
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Organisations disclose their privacy practices by posting privacy policies on their websites. Even though internet users often care about their digital privacy, they usually do not read privacy policies, since understanding them requires a significant investment of time and effort. Natural language processing has been used to create experimental tools to interpret privacy policies, but there has been a lack of large privacy policy corpora to facilitate the creation of large-scale semi-supervised and unsupervised models to interpret and simplify privacy policies. Thus, we present the PrivaSeer Corpus of 1,005,380 English language website privacy policies collected from the web. The number of unique websites represented in PrivaSeer is about ten times larger than the next largest public collection of web privacy policies, and it surpasses the aggregate of unique websites represented in all other publicly available privacy policy corpora combined. We describe a corpus creation pipeline with stages that include a web crawler, language detection, document classification, duplicate and near-duplicate removal, and content extraction. We employ an unsupervised topic modelling approach to investigate the contents of policy documents in the corpus and discuss the distribution of topics in privacy policies at web scale. We further investigate the relationship between privacy policy domain PageRanks and text features of the privacy policies. Finally, we use the corpus to pretrain PrivBERT, a transformer-based privacy policy language model, and obtain state of the art results on the data practice classification and question answering tasks.