2024
pdf
bib
abs
On the Intractability to Synthesize Factual Inconsistencies in Summarization
Ge Luo
|
Weisi Fan
|
Miaoran Li
|
Youbiao He
|
Yinfei Yang
|
Forrest Bao
Findings of the Association for Computational Linguistics: EACL 2024
Factual consistency detection has gotten raised attention in the task of abstractive summarization. Many existing works rely on synthetic training data, which may not accurately reflect or match the inconsistencies produced by summarization models. In this paper, we first systematically analyze the shortcomings of the current methods in synthesizing inconsistent summaries. Current synthesis methods may fail to produce inconsistencies of coreference errors and discourse errors, per our quantitative and qualitative study. Then, employing the parameter-efficient finetuning (PEFT) technique, we discover that a competitive factual consistency detector can be achieved using thousands of real model-generated summaries with human annotations. Our study demonstrates the importance of real machine-generated texts with human annotation in NLG evaluation as our model outperforms the SOTA on the CoGenSumm, FactCC, Frank, and SummEval datasets.
pdf
bib
abs
Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models
Miaoran Li
|
Baolin Peng
|
Michel Galley
|
Jianfeng Gao
|
Zhu Zhang
Findings of the Association for Computational Linguistics: NAACL 2024
Fact-checking is an essential task in NLP that is commonly utilized to validate the factual accuracy of a piece of text. Previous approaches mainly involve the resource-intensive process of fine-tuning pre-trained language models on specific datasets. In addition, there is a notable gap in datasets that focus on fact-checking texts generated by large language models (LLMs). In this paper, we introduce Self-Checker, a plug-and-play framework that harnesses LLMs for efficient and rapid fact-checking in a few-shot manner. We also present the BingCheck dataset, specifically designed for fact-checking texts generated by LLMs. Empirical results demonstrate the potential of Self-Checker in the use of LLMs for fact-checking. Compared to state-of-the-art fine-tuned models, there is still significant room for improvement, indicating that adopting LLMs could be a promising direction for future fact-checking research.
pdf
bib
abs
SummaCoz: A Dataset for Improving the Interpretability of Factual Consistency Detection for Summarization
Ge Luo
|
Weisi Fan
|
Miaoran Li
|
Guoruizhe Sun
|
Runlong Zhang
|
Chenyu Xu
|
Forrest Sheng Bao
Findings of the Association for Computational Linguistics: EMNLP 2024
Summarization is an important application of Large Language Models (LLMs). When judging the quality of a summary, factual consistency holds a significant weight. Despite numerous efforts dedicated to building factual inconsistency detectors, the exploration of explanability remains limited among existing effort. In this study, we incorporate both human-annotated and model-generated natural language explanations elucidating how a summary deviates and thus becomes inconsistent with its source article. We build our explanation-augmented dataset on top of the widely used SummaC summarization consistency benchmark. Additionally, we develop an inconsistency detector that is jointly trained with the collected explanations. Our findings demonstrate that integrating explanations during training not only enables the model to provide rationales for its judgments but also enhances its accuracy significantly.
2023
pdf
bib
abs
Enhancing Task Bot Engagement with Synthesized Open-Domain Dialog
Miaoran Li
|
Baolin Peng
|
Michel Galley
|
Jianfeng Gao
|
Zhu (Drew) Zhang
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
The construction of dialog systems for various types of conversations, such as task-oriented dialog (TOD) and open-domain dialog (ODD), has been an active area of research. In order to more closely mimic human-like conversations that often involve the fusion of different dialog modes, it is important to develop systems that can effectively handle both TOD and ODD and access different knowledge sources. In this work, we present a new automatic framework to enrich TODs with synthesized ODDs. We also introduce the PivotBot model, which is capable of handling both TOD and ODD modes and can access different knowledge sources to generate informative responses. Evaluation results indicate the superior ability of the proposed model to switch smoothly between TOD and ODD tasks.