Selective-LAMA: Selective Prediction for Confidence-Aware Evaluation of Language Models

Hiyori Yoshikawa, Naoaki Okazaki


Abstract
Recent studies have suggested that neural language models learn and store a large amount of facts and commonsense knowledge from training data. The ability of language models to restore such knowledge is often evaluated via zero-shot cloze-style QA tasks. However, such evaluations rely only on prediction accuracy without punishing the systems for their mistakes, e.g., simply guessing or hallucinating likely answers. Selective prediction is a more informative evaluation framework that takes the confidence of predictions into account. Under the selective prediction setting, a model is evaluated not only by the number of correct predictions, but also by the ability to filter out dubious predictions by estimating the confidence of individual predictions. Such confidence-aware evaluation is crucial for determining whether to trust zero-shot predictions of language models. In this paper, we apply the selective prediction setting to an existing benchmark, LAMA probe, and conduct extensive experiments with recent neural language models and different confidence functions. We empirically show that our Selective-LAMA evaluation is more robust to the effect of simple guesses than the conventional accuracy-based evaluation. Our evaluation reveals the importance of the choice of confidence functions by showing that simply relying on token probabilities is not always the best choice. Further analysis shows that various confidence functions exhibit different preferences over predicted tokens for a given context.
Anthology ID:
2023.findings-eacl.150
Volume:
Findings of the Association for Computational Linguistics: EACL 2023
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2017–2028
Language:
URL:
https://aclanthology.org/2023.findings-eacl.150
DOI:
10.18653/v1/2023.findings-eacl.150
Bibkey:
Cite (ACL):
Hiyori Yoshikawa and Naoaki Okazaki. 2023. Selective-LAMA: Selective Prediction for Confidence-Aware Evaluation of Language Models. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2017–2028, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Selective-LAMA: Selective Prediction for Confidence-Aware Evaluation of Language Models (Yoshikawa & Okazaki, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-eacl.150.pdf
Software:
 2023.findings-eacl.150.software.zip
Video:
 https://aclanthology.org/2023.findings-eacl.150.mp4