Confidently Wrong: Exploring the Calibration and Expression of (Un)Certainty of Large Language Models in a Multilingual Setting

Lea Krause; Wondimagegnhue Tufa; Selene Baez Santamaria; Angel Daza; Urja Khurana; Piek Vossen

Confidently Wrong: Exploring the Calibration and Expression of (Un)Certainty of Large Language Models in a Multilingual Setting

Lea Krause, Wondimagegnhue Tufa, Selene Baez Santamaria, Angel Daza, Urja Khurana, Piek Vossen

Abstract

While the fluency and coherence of Large Language Models (LLMs) in text generation have seen significant improvements, their competency in generating appropriate expressions of uncertainty remains limited.Using a multilingual closed-book QA task and GPT-3.5, we explore how well LLMs are calibrated and express certainty across a diverse set of languages, including low-resource settings. Our results reveal strong performance in high-resource languages but a marked decline in performance in lower-resource languages. Across all, we observe an exaggerated expression of confidence in the model, which does not align with the correctness or likelihood of its responses. Our findings highlight the need for further research into accurate calibration of LLMs especially in a multilingual setting.

Anthology ID:: 2023.mmnlg-1.1
Volume:: Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023)
Month:: September
Year:: 2023
Address:: Prague, Czech Republic
Editors:: Albert Gatt, Claire Gardent, Liam Cripwell, Anya Belz, Claudia Borg, Aykut Erdem, Erkut Erdem
Venues:: MMNLG | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–9
Language:
URL:: https://aclanthology.org/2023.mmnlg-1.1/
DOI:
Bibkey:
Cite (ACL):: Lea Krause, Wondimagegnhue Tufa, Selene Baez Santamaria, Angel Daza, Urja Khurana, and Piek Vossen. 2023. Confidently Wrong: Exploring the Calibration and Expression of (Un)Certainty of Large Language Models in a Multilingual Setting. In Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023), pages 1–9, Prague, Czech Republic. Association for Computational Linguistics.
Cite (Informal):: Confidently Wrong: Exploring the Calibration and Expression of (Un)Certainty of Large Language Models in a Multilingual Setting (Krause et al., MMNLG 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.mmnlg-1.1.pdf

PDF Cite Search Fix data