2024
pdf
bib
abs
Advancing Large Language Model Attribution through Self-Improving
Lei Huang
|
Xiaocheng Feng
|
Weitao Ma
|
Liang Zhao
|
Yuchun Fan
|
Weihong Zhong
|
Dongliang Xu
|
Qing Yang
|
Hongtao Liu
|
Bing Qin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a Self-Taught AttRibuTion framework for iteratively improving the attribution capability of LLMs. First, to prevent models from stagnating due to initially insufficient supervision signals, START leverages the model to self-construct synthetic training data for warming up. To further self-improve the model’s attribution ability, START iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation. Experiments on three open-domain question-answering datasets, covering long-form QA and multi-step reasoning, demonstrate significant performance gains of 25.13% on average without relying on human annotations and more advanced models. Further analysis reveals that START excels in aggregating information across multiple sources.
pdf
bib
abs
GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization
Yangfan Ye
|
Xiachong Feng
|
Xiaocheng Feng
|
Weitao Ma
|
Libo Qin
|
Dongliang Xu
|
Qing Yang
|
Hongtao Liu
|
Bing Qin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
News summarization in today’s global scene can be daunting with its flood of multilingual content and varied viewpoints from different sources. However, current studies often neglect such real-world scenarios as they tend to focus solely on either single-language or single-document tasks. To bridge this gap, we aim to unify Multi-lingual, Cross-lingual and Multi-document Summarization into a novel task, i.e., MCMS, which encapsulates the real-world requirements all-in-one. Nevertheless, the lack of a benchmark inhibits researchers from adequately studying this invaluable problem. To tackle this, we have meticulously constructed the GLOBESUMM dataset by first collecting a wealth of multilingual news reports and restructuring them into event-centric format. Additionally, we introduce the method of protocol-guided prompting for high-quality and cost-effective reference annotation. In MCMS, we also highlight the challenge of conflicts between news reports, in addition to the issues of redundancies and omissions, further enhancing the complexity of GLOBESUMM. Through extensive experimental analysis, we validate the quality of our dataset and elucidate the inherent challenges of the task. We firmly believe that GLOBESUMM, given its challenging nature, will greatly contribute to the multilingual communities and the evaluation of LLMs.
pdf
bib
abs
Learning Fine-Grained Grounded Citations for Attributed Large Language Models
Lei Huang
|
Xiaocheng Feng
|
Weitao Ma
|
Yuxuan Gu
|
Weihong Zhong
|
Xiachong Feng
|
Weijiang Yu
|
Weihua Peng
|
Duyu Tang
|
Dandan Tu
|
Bing Qin
Findings of the Association for Computational Linguistics: ACL 2024
Despite the impressive performance on information-seeking tasks, large language models (LLMs) still struggle with hallucinations. Attributed LLMs, which augment generated text with in-line citations, demonstrate potential in mitigating hallucinations and improving verifiability. However, current approaches suffer from suboptimal citation quality due to their reliance on in-context learning. Furthermore, the practice of merely citing document identifiers complicates the process for users to pinpoint specific supporting evidence. In this work, we introduce FRONT, a training framework that teaches LLMs to generate Fine-grained grounded citations. By initially grounding fine-grained supporting quotes, which then guide the generation process, these quotes not only provide supervision signals to improve citation quality but also serve as fine-grained attributions. Experiments on the ALCE benchmark demonstrate the efficacy of FRONT in generating superior grounded responses and highly supportive citations. With LLaMA-2-7B, the framework significantly outperforms all the baselines, achieving an average of 14.21% improvement in citation quality across all datasets, even surpassing ChatGPT.
pdf
bib
abs
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models
Weihong Zhong
|
Xiaocheng Feng
|
Liang Zhao
|
Qiming Li
|
Lei Huang
|
Yuxuan Gu
|
Weitao Ma
|
Yuan Xu
|
Bing Qin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs’ subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, will LVLMs be misled and respond incorrectly, even though the ground visual information exists? To answer this, we propose a framework called \\textitMMHalSnowball to evaluate LVLMs’ behaviors when encountering generated hallucinations, where LVLMs are required to answer specific visual questions within a curated hallucinatory conversation. Crucially, our experiment shows that the performance of open-source LVLMs drops by at least 31\\%, indicating that LVLMs are prone to accept the generated hallucinations and make false claims that they would not have supported without distractions. We term this Multimodal Hallucination Snowballing. To mitigate this issue, we further propose a training-free method called Residual Visual Decoding, where we revise the output distribution of LVLMs with the one derived from the residual visual input, providing models with direct access to the visual information. Experiments show that our method can mitigate more than 24\\% of the snowballed multimodal hallucination while maintaining capabilities.