Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric

Ian Berlot-Attwell, Frank Rudzicz


Abstract
In this work, we evaluate various existing dialogue relevance metrics, find strong dependency on the dataset, often with poor correlation with human scores of relevance, and propose modifications to reduce data requirements and domain sensitivity while improving correlation. Our proposed metric achieves state-of-the-art performance on the HUMOD dataset while reducing measured sensitivity to dataset by 37%-66%. We achieve this without fine-tuning a pretrained language model, and using only 3,750 unannotated human dialogues and a single negative example. Despite these limitations, we demonstrate competitive performance on four datasets from different domains. Our code, including our metric and experiments, is open sourced.
Anthology ID:
2022.nlp4convai-1.14
Volume:
Proceedings of the 4th Workshop on NLP for Conversational AI
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Bing Liu, Alexandros Papangelis, Stefan Ultes, Abhinav Rastogi, Yun-Nung Chen, Georgios Spithourakis, Elnaz Nouri, Weiyan Shi
Venue:
NLP4ConvAI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
166–183
Language:
URL:
https://aclanthology.org/2022.nlp4convai-1.14
DOI:
10.18653/v1/2022.nlp4convai-1.14
Bibkey:
Cite (ACL):
Ian Berlot-Attwell and Frank Rudzicz. 2022. Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric. In Proceedings of the 4th Workshop on NLP for Conversational AI, pages 166–183, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric (Berlot-Attwell & Rudzicz, NLP4ConvAI 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nlp4convai-1.14.pdf
Video:
 https://aclanthology.org/2022.nlp4convai-1.14.mp4
Code
 ikb-a/idk-dialogue-relevance
Data
FED