Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models

Shuaijie She, Shujian Huang, Xingyun Wang, Yanke Zhou, Jiajun Chen


Abstract
LLMs (Large Language Models) usually interact with users in the form of dialogue and generate responses following their instructions, which naturally require dialogue comprehension abilities. However, dialogue comprehension is a general language ability which is hard to be evaluated directly. In this work, we propose to perform the evaluation focusing on the factual consistency issue with the help of the dialogue summarization task. Besides evaluating and analyzing the dialogue summarization performance (DIAC-Sum) of different LLMs, we also derive factual questions from the generated summaries and use them as a more flexible measurement of dialogue comprehension (DIAC-FactQA). Our evaluation shows that, on average, 26.8% of the summaries generated by LLMs contain factual inconsistency. Even ChatGPT, the strongest model evaluated, has such errors in 16% of its summaries. For answering the factual questions, which is more challenging, the average error rate of all evaluated LLMs is 36.1%. Both results indicate serious deficiencies. Detailed analysis shows that the understanding of subject/object of the conversation is still challenging for LLMs. Furthermore, to stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data, which achieved a relative error rate reduction of 11% on DIAC-FactQA.
Anthology ID:
2024.naacl-long.338
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6087–6100
Language:
URL:
https://aclanthology.org/2024.naacl-long.338
DOI:
Bibkey:
Cite (ACL):
Shuaijie She, Shujian Huang, Xingyun Wang, Yanke Zhou, and Jiajun Chen. 2024. Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6087–6100, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models (She et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.338.pdf
Copyright:
 2024.naacl-long.338.copyright.pdf