What Was Your Name Again? Interrogating Generative Conversational Models For Factual Consistency Evaluation

Ehsan Lotfi; Maxime De Bruyn; Jeska Buhmann; Walter Daelemans

doi:10.18653/v1/2022.gem-1.47

What Was Your Name Again? Interrogating Generative Conversational Models For Factual Consistency Evaluation

Ehsan Lotfi, Maxime De Bruyn, Jeska Buhmann, Walter Daelemans

Abstract

Generative conversational agents are known to suffer from problems like inconsistency and hallucination, and a big challenge in studying these issues remains evaluation: they are not properly reflected in common text generation metrics like perplexity or BLEU, and alternative implicit methods like semantic similarity or NLI labels can be misguided when few specific tokens are decisive. In this work we propose ConsisTest; a factual consistency benchmark including both WH and Y/N questions based on PersonaChat, along with a hybrid evaluation pipeline which aims to get the best of symbolic and sub-symbolic methods. Using these and focusing on pretrained generative models like BART, we provide detailed statistics and analysis on how the model’s consistency is affected by variations in question and context.

Anthology ID:: 2022.gem-1.47
Volume:: Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Antoine Bosselut, Khyathi Chandu, Kaustubh Dhole, Varun Gangal, Sebastian Gehrmann, Yacine Jernite, Jekaterina Novikova, Laura Perez-Beltrachini
Venue:: GEM
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 509–519
Language:
URL:: https://aclanthology.org/2022.gem-1.47/
DOI:: 10.18653/v1/2022.gem-1.47
Bibkey:
Cite (ACL):: Ehsan Lotfi, Maxime De Bruyn, Jeska Buhmann, and Walter Daelemans. 2022. What Was Your Name Again? Interrogating Generative Conversational Models For Factual Consistency Evaluation. In Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 509–519, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: What Was Your Name Again? Interrogating Generative Conversational Models For Factual Consistency Evaluation (Lotfi et al., GEM 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.gem-1.47.pdf
Video:: https://aclanthology.org/2022.gem-1.47.mp4

PDF Cite Search Video Fix data