Samuel Dahan

2025

Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback
Chu Fei Luo | Samuel Dahan | Xiaodan Zhu
Findings of the Association for Computational Linguistics: EMNLP 2025

As language models have a greater impact on society, it is important to ensure they are aligned to a diverse range of perspectives and are able to reflect nuance in human values. However, the most popular training paradigms for modern language models often assume there is one optimal answer for every query, leading to generic responses and poor alignment. In this work, we aim to enhance pluralistic alignment of language models in a low-resource setting with two methods: pluralistic decoding and model steering. We empirically demonstrate that model steering offers consistent improvement over zero-shot and few-shot baselines with only 50 annotated samples. Our proposed methods decrease false positives in several high-stakes tasks such as hate speech detection and misinformation detection, and improves the distributional alignment to human values from different demographics. We hope our work highlights the importance of diversity and how language models can be adapted to consider nuanced perspectives.

2024

pdf bib abs

Misinformation with Legal Consequences (MisLC): A New Task Towards Harnessing Societal Harm of Misinformation
Chu Fei Luo | Radin Shayanfar | Rohan V Bhambhoria | Samuel Dahan | Xiaodan Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024

Misinformation, defined as false or inaccurate information, can result in significant societal harm when it is spread with malicious or even unintentional intent. The rapid online information exchange necessitates advanced detection mechanisms to mitigate misinformation-induced harm. Existing research, however, has predominantly focused on the veracity of information, overlooking the legal implications and consequences of misinformation. In this work, we take a novel angle to consolidate the definition of misinformation detection using legal issues as a measurement of societal ramifications, aiming to bring interdisciplinary efforts to tackle misinformation and its consequence. We introduce a new task: Misinformation with Legal Consequence (MisLC), which leverages definitions from a wide range of legal domains covering 4 broader legal topics and 11 fine-grained legal issues, including hate speech, election laws, and privacy regulations. For this task, we advocate a two-step dataset curation approach that utilizes crowd-sourced checkworthiness and expert evaluations of misinformation. We provide insights about the MisLC task through empirical evidence, from the problem definition to experiments and expert involvement. While the latest large language models and retrieval-augmented generation are effective baselines for the task, we find they are still far from replicating expert performance.

2023

pdf bib abs

Legally Enforceable Hate Speech Detection for Public Forums
Chu Luo | Rohan Bhambhoria | Samuel Dahan | Xiaodan Zhu
Findings of the Association for Computational Linguistics: EMNLP 2023

Hate speech causes widespread and deep-seated societal issues. Proper enforcement of hate speech laws is key for protecting groups of people against harmful and discriminatory language. However, determining what constitutes hate speech is a complex task that is highly open to subjective interpretations. Existing works do not align their systems with enforceable definitions of hate speech, which can make their outputs inconsistent with the goals of regulators. This research introduces a new perspective and task for enforceable hate speech detection centred around legal definitions, and a dataset annotated on violations of eleven possible definitions by legal experts. Given the challenge of identifying clear, legally enforceable instances of hate speech, we augment the dataset with expert-generated samples and an automatically mined challenge set. We experiment with grounding the model decision in these definitions using zero-shot and few-shot prompting. We then report results on several large language models (LLMs). With this task definition, automatic hate speech detection can be more closely aligned to enforceable laws, and hence assist in more rigorous enforcement of legal protections against harmful speech in public forums.

pdf bib abs

Prototype-Based Interpretability for Legal Citation Prediction
Chu Fei Luo | Rohan Bhambhoria | Samuel Dahan | Xiaodan Zhu
Findings of the Association for Computational Linguistics: ACL 2023

Deep learning has made significant progress in the past decade, and demonstrates potential to solve problems with extensive social impact. In high-stakes decision making areas such as law, experts often require interpretability for automatic systems to be utilized in practical settings. In this work, we attempt to address these requirements applied to the important problem of legal citation prediction (LCP). We design the task with parallels to the thought-process of lawyers, i.e., with reference to both precedents and legislative provisions. After initial experimental results, we refine the target citation predictions with the feedback of legal experts. Additionally, we introduce a prototype architecture to add interpretability, achieving strong performance while adhering to decision parameters used by lawyers. Our study builds on and leverages the state-of-the-art language processing models for law, while addressing vital considerations for high-stakes tasks with practical societal impact.

Co-authors

Radin Shayanfar 1

Venues

Findings4

Fix author