C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation

Liliang Ren, Mankeerat Sidhu, Qi Zeng, Revanth Gangi Reddy, Heng Ji, ChengXiang Zhai


Abstract
Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the user based on a given evaluation dimension. Experimental results on the widely used FED dialogue evaluation dataset demonstrate that our approach significantly improves the correlation with human judgment compared with existing evaluation systems. By replacing the negative log-likelihood-based scorer with our proposed C-PMI scorer, we achieve a relative 60.5% higher Spearman correlation on average for the FED evaluation metric. Our code is publicly available at https://github.com/renll/C-PMI.
Anthology ID:
2023.dialdoc-1.9
Volume:
Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Smaranda Muresan, Vivian Chen, Kennington Casey, Vandyke David, Dethlefs Nina, Inoue Koji, Ekstedt Erik, Ultes Stefan
Venue:
dialdoc
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
80–85
Language:
URL:
https://aclanthology.org/2023.dialdoc-1.9
DOI:
10.18653/v1/2023.dialdoc-1.9
Bibkey:
Cite (ACL):
Liliang Ren, Mankeerat Sidhu, Qi Zeng, Revanth Gangi Reddy, Heng Ji, and ChengXiang Zhai. 2023. C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation. In Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, pages 80–85, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation (Ren et al., dialdoc 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.dialdoc-1.9.pdf