Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models

Nishanth Madhusudhan, Sathwik Tejaswi Madhusudhan, Vikas Yadav, Masoud Hashemi


Abstract
Abstention Ability (AA) is a critical aspect of Large Language Model (LLM) reliability, referring to an LLM’s capability to withhold responses when uncertain or lacking a definitive answer, without compromising performance. Although previous studies have attempted to improve AA, they lack a standardized evaluation method and remain unsuitable for black-box models where token prediction probabilities are inaccessible. This makes comparative analysis challenging, especially for state-of-the-art closed-source commercial LLMs. This paper bridges this gap by introducing a black-box evaluation approach and a new dataset, Abstain-QA, crafted to rigorously assess AA across varied question types (answerable and unanswerable), domains (well-represented and under-represented), and task types (fact-centric and reasoning). We also propose a new confusion matrix, the ”Answerable-Unanswerable Confusion Matrix” (AUCM) which serves as the basis for evaluating AA, by offering a structured and precise approach for assessment. Finally, we explore the impact of three prompting strategies — Strict Prompting, Verbal Confidence Thresholding, and Chain-of-Thought (CoT) — on improving AA. Our results indicate that even powerful models like GPT-4, Mixtral 8x22b encounter difficulties with abstention; however, strategic approaches such as Strict prompting and CoT can enhance this capability.
Anthology ID:
2025.coling-main.627
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9329–9345
Language:
URL:
https://aclanthology.org/2025.coling-main.627/
DOI:
Bibkey:
Cite (ACL):
Nishanth Madhusudhan, Sathwik Tejaswi Madhusudhan, Vikas Yadav, and Masoud Hashemi. 2025. Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models. In Proceedings of the 31st International Conference on Computational Linguistics, pages 9329–9345, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models (Madhusudhan et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.627.pdf