Sathwik Tejaswi Madhusudhan

2025

pdf bib abs
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models
Nishanth Madhusudhan | Sathwik Tejaswi Madhusudhan | Vikas Yadav | Masoud Hashemi
Proceedings of the 31st International Conference on Computational Linguistics

Abstention Ability (AA) is a critical aspect of Large Language Model (LLM) reliability, referring to an LLM’s capability to withhold responses when uncertain or lacking a definitive answer, without compromising performance. Although previous studies have attempted to improve AA, they lack a standardized evaluation method and remain unsuitable for black-box models where token prediction probabilities are inaccessible. This makes comparative analysis challenging, especially for state-of-the-art closed-source commercial LLMs. This paper bridges this gap by introducing a black-box evaluation approach and a new dataset, Abstain-QA, crafted to rigorously assess AA across varied question types (answerable and unanswerable), domains (well-represented and under-represented), and task types (fact-centric and reasoning). We also propose a new confusion matrix, the ”Answerable-Unanswerable Confusion Matrix” (AUCM) which serves as the basis for evaluating AA, by offering a structured and precise approach for assessment. Finally, we explore the impact of three prompting strategies — Strict Prompting, Verbal Confidence Thresholding, and Chain-of-Thought (CoT) — on improving AA. Our results indicate that even powerful models like GPT-4, Mixtral 8x22b encounter difficulties with abstention; however, strategic approaches such as Strict prompting and CoT can enhance this capability.

2024

pdf bib abs
Enhancing Alignment using Curriculum Learning & Ranked Preferences
Pulkit Pattnaik | Rishabh Maheshwary | Kelechi Ogueji | Vikas Yadav | Sathwik Tejaswi Madhusudhan
Findings of the Association for Computational Linguistics: EMNLP 2024

Direct Preference Optimization (DPO) is an effective technique that leverages pairwise preference data (one chosen and rejected response per prompt) to align LLMs to human preferences. In practice, multiple responses could exist for a given prompt with varying quality relative to each other. We propose to utilize these responses to create multiple preference pairs for a given prompt. Our work focuses on aligning LLMs by systematically curating multiple preference pairs and presenting them in a meaningful manner facilitating curriculum learning to enhance the prominent DPO technique. We order multiple preference pairs from easy to hard, according to various criteria thus emulating curriculum learning. Our method, which is referred to as Curri-DPO consistently shows increased performance gains on MTbench, Vicuna bench, WizardLM, highlighting its effectiveness over standard DPO setting that utilizes single preference pair. More specifically, Curri-DPO achieves a score of 7.43 on MTbench with Zephyr-7B, outperforming majority of existing LLMs with similar parameter size. Curri-DPO also achieves the highest win rates on Vicuna, WizardLM, and UltraFeedback test sets (90.7%, 87.1%, and 87.9% respectively) in our experiments, with notable gains of up to 7.5% when compared to standard DPO. We release the preference pairs used in alignment at: https://huggingface.co/datasets/ServiceNow-AI/Curriculum_DPO_preferences.

Co-authors

Pulkit Pattnaik 1

Venues

coling1
findings1

Fix data