Michael Galarnyk

2025

pdf bib abs
ConfReady: A RAG based Assistant and Dataset for Conference Checklist Responses
Michael Galarnyk | Rutwik Routu | Vidhyakshaya Kannan | Kosha Bheda | Prasun Banerjee | Agam Shah | Sudheer Chava
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The ARR Responsible NLP Research checklist website states that the “checklist is designed to encourage best practices for responsible research, addressing issues of research ethics, societal impact and reproducibility.” Answering the questions is an opportunity for authors to reflect on their work and make sure any shared scientific assets follow best practices. Ideally, considering a checklist before submission can favorably impact the writing of a research paper. However, previous research has shown that self-reported checklist responses don’t always accurately represent papers. In this work, we introduce ConfReady, a retrieval-augmented generation (RAG) application that can be used to empower authors to reflect on their work and assist authors with conference checklists. To evaluate checklist assistants, we curate a dataset of 1,975 ACL checklist responses, analyze problems in human answers, and benchmark RAG and Large Language Model (LM) based systems on an evaluation subset. Our code is released under the AGPL-3.0 license on GitHub, with documentation covering the user interface and PyPI package.

pdf bib abs
How Inclusively do LMs Perceive Social and Moral Norms?
Michael Galarnyk | Agam Shah | Dipanwita Guhathakurta | Poojitha Nandigam | Sudheer Chava
Findings of the Association for Computational Linguistics: NAACL 2025

**This paper discusses and contains offensive content.** Language models (LMs) are used in decision-making systems and as interactive assistants. However, how well do these models making judgements align with the diversity of human values, particularly regarding social and moral norms? In this work, we investigate how inclusively LMs perceive norms across demographic groups (e.g., gender, age, and income). We prompt 11 LMs on rules-of-thumb (RoTs) and compare their outputs with the existing responses of 100 human annotators. We introduce the Absolute Distance Alignment Metric (ADA-Met) to quantify alignment on ordinal questions. We find notable disparities in LM responses, with younger, higher-income groups showing closer alignment, raising concerns about the representation of marginalized perspectives. Our findings highlight the importance of further efforts to make LMs more inclusive of diverse human values. The code and prompts are available on GitHub under the CC BY-NC 4.0 license.

Co-authors

Vidhyakshaya Kannan 1

Poojitha Nandigam 1

Rutwik Routu 1

Venues

emnlp1
findings1

Fix author