2024
pdf
bib
abs
Editing Conceptual Knowledge for Large Language Models
Xiaohan Wang
|
Shengyu Mao
|
Shumin Deng
|
Yunzhi Yao
|
Yue Shen
|
Lei Liang
|
Jinjie Gu
|
Huajun Chen
|
Ningyu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024
Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establishing a suite of new metrics for evaluation. The experimental results reveal that, although existing editing methods can efficiently modify concept-level definition to some extent, they also have the potential to distort the related instantial knowledge in LLMs, leading to poor performance. We anticipate this work can inspire further progress in understanding LLMs.
pdf
bib
abs
Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs
Junjie Wang
|
Mingyang Chen
|
Binbin Hu
|
Dan Yang
|
Ziqi Liu
|
Yue Shen
|
Peng Wei
|
Zhiqiang Zhang
|
Jinjie Gu
|
Jun Zhou
|
Jeff Z. Pan
|
Wen Zhang
|
Huajun Chen
Findings of the Association for Computational Linguistics: EMNLP 2024
Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs’ performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fine-tuning. Previous work has relied on manual annotation and knowledge distillation from teacher LLMs, which are time-consuming and not accurate enough. In this paper, we introduce a novel framework for enhancing LLMs’ planning capabilities by using planning data derived from knowledge graphs (KGs). LLMs fine-tuned with this data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval. Evaluations on multiple datasets, including our newly proposed benchmark, highlight the effectiveness of our framework and the benefits of KG-derived planning data.
pdf
bib
abs
Unified Hallucination Detection for Multimodal Large Language Models
Xiang Chen
|
Chenxi Wang
|
Yida Xue
|
Ningyu Zhang
|
Xiaoyan Yang
|
Qiang Li
|
Yue Shen
|
Lei Liang
|
Jinjie Gu
|
Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucination. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguarding of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks, an inadequate range of hallucination categories addressed, and a lack of detailed granularity. In response to these challenges, our work expands the investigative horizons of hallucination detection. We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. Additionally, we unveil a novel unified multimodal hallucination detection framework, UNIHD, which leverages a suite of auxiliary tools to validate the occurrence of hallucinations robustly. We demonstrate the effectiveness of UNIHD through meticulous evaluation and comprehensive analysis. We also provide strategic insights on the application of specific tools for addressing various categories of hallucinations.
pdf
bib
abs
CharPoet: A Chinese Classical Poetry Generation System Based on Token-free LLM
Chengyue Yu
|
Lei Zang
|
Jiaotuan Wang
|
Chenyi Zhuang
|
Jinjie Gu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Automatic Chinese classical poetry generation has attracted much research interest, but achieving effective control over format and content simultaneously remains challenging. Traditional systems usually accept keywords as user inputs, resulting in limited control over content. Large language models (LLMs) improve content control by allowing unrestricted user instructions, but the token-by-token generation process frequently makes format errors. Motivated by this, we propose CharPoet, a Chinese classical poetry generation system based on token-free LLM, which provides effective control over both format and content. Our token-free architecture generates in a character-by-character manner, enabling precise control over the number of characters. Pruned from existing token-based LLMs, CharPoet inherits their pretrained capabilities and can generate poetry following instructions like �Write me a poem for my mother’s birthday.� CharPoet achieves format accuracy above 0.96, outperforming Jiuge-GPT-2 (0.91) and GPT-4 (0.38). In terms of content quality, CharPoet surpasses traditional systems including Jiuge, and is comparable to other LLMs. Our system is open source and available at https://modelscope.cn/models/CharPoet/CharPoet. A video demonstration of CharPoet is available at https://youtu.be/voZ25qEp3Dc.