LLMs May Perform MCQA by Selecting the Least Incorrect Option

Haochun Wang; Sendong Zhao; Zewen Qiang; Nuwa Xi; Bing Qin (秦兵); Ting Liu

LLMs May Perform MCQA by Selecting the Least Incorrect Option

Haochun Wang, Sendong Zhao, Zewen Qiang, Nuwa Xi, Bing Qin, Ting Liu

Abstract

In the field of NLP, Large Language Models (LLMs) have markedly enhanced performance across a variety of tasks. However, the comprehensive evaluation of LLMs remains an inevitable challenge for the community. Recently, the adoption of Multiple Choice Question Answering (MCQA) as a benchmark for assessing LLMs has gained considerable traction. However, concerns regarding the robustness of this evaluative method persist. Building upon previous discussions on the issue of variability, we reveal an additional dimension of concern: LLMs may perform MCQA by selecting the least incorrect option rather than distinctly correct. This observation suggests that LLMs might regard multiple options as correct, which could undermine the reliability of MCQA as a metric for evaluating LLMs. To address this challenge, we introduce an enhanced dataset augmentation method for MCQA, termed MCQA+, to provide a more accurate reflection of the performance, thereby highlighting the necessity for more sophisticated evaluation mechanisms in the assessment of LLM capabilities.

Anthology ID:: 2025.coling-main.390
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5852–5862
Language:
URL:: https://aclanthology.org/2025.coling-main.390/
DOI:
Bibkey:
Cite (ACL):: Haochun Wang, Sendong Zhao, Zewen Qiang, Nuwa Xi, Bing Qin, and Ting Liu. 2025. LLMs May Perform MCQA by Selecting the Least Incorrect Option. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5852–5862, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: LLMs May Perform MCQA by Selecting the Least Incorrect Option (Wang et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.390.pdf

PDF Cite Search Fix data