Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

John Mendonça, Patrícia Pereira, Helena Moniz, Joao Paulo Carvalho, Alon Lavie, Isabel Trancoso


Abstract
Despite significant research effort in the development of automatic dialogue evaluation metrics, little thought is given to evaluating dialogues other than in English. At the same time, ensuring metrics are invariant to semantically similar responses is also an overlooked topic. In order to achieve the desired properties of robustness and multilinguality for dialogue evaluation metrics, we propose a novel framework that takes advantage of the strengths of current evaluation models with the newly-established paradigm of prompting Large Language Models (LLMs). Empirical results show our framework achieves state of the art results in terms of mean Spearman correlation scores across several benchmarks and ranks first place on both the Robust and Multilingual tasks of the DSTC11 Track 4 “Automatic Evaluation Metrics for Open-Domain Dialogue Systems”, proving the evaluation capabilities of prompted LLMs.
Anthology ID:
2023.dstc-1.16
Volume:
Proceedings of The Eleventh Dialog System Technology Challenge
Month:
September
Year:
2023
Address:
Prague, Czech Republic
Editors:
Yun-Nung Chen, Paul Crook, Michel Galley, Sarik Ghazarian, Chulaka Gunasekara, Raghav Gupta, Behnam Hedayatnia, Satwik Kottur, Seungwhan Moon, Chen Zhang
Venues:
DSTC | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
133–143
Language:
URL:
https://aclanthology.org/2023.dstc-1.16
DOI:
Bibkey:
Cite (ACL):
John Mendonça, Patrícia Pereira, Helena Moniz, Joao Paulo Carvalho, Alon Lavie, and Isabel Trancoso. 2023. Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation. In Proceedings of The Eleventh Dialog System Technology Challenge, pages 133–143, Prague, Czech Republic. Association for Computational Linguistics.
Cite (Informal):
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation (Mendonça et al., DSTC-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.dstc-1.16.pdf