2025
pdf
bib
abs
MisinfoBench: A Multi-Dimensional Benchmark for Evaluating LLMs’ Resilience to Misinformation
Ye Yang
|
Donghe Li
|
Zuchen Li
|
Fengyuan Li
|
Jingyi Liu
|
Li Sun
|
Qingyu Yang
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs) excel in various Natural Language Processing (NLP) tasks but remain vulnerable to misinformation, particularly in multi-turn dialogues where misleading context accumulates. Existing benchmarks, such as TruthfulQA and FEVER, assess factual accuracy in isolated queries but fail to evaluate LLMs’ resilience to misinformation in interactive settings. To address this limitation, we introduce MisinfoBench, a multi-dimensional benchmark designed to assess LLMs’ ability to discern, resist, and reject misinformation. MisinfoBench defines three core dimensions—Discernment, Resistance, and Principled Refusal—across seven evaluation tasks, systematically testing misinformation identification, contextual resistance, and the rejection of coercive false premises. It includes a dataset of 4,962 multi-turn dialogues and 2,000 misinformation-based question-answer pairs, capturing diverse misinformation scenarios. We evaluate 16 LLMs, revealing substantial disparities in misinformation resilience: proprietary models outperform open-source counterparts, while multi-turn dialogues and cross-lingual settings exacerbate misinformation susceptibility. Our findings highlight persistent vulnerabilities in LLMs’ misinformation defenses, emphasizing the need for context-aware training, adversarial robustness, and principled reasoning. MisinfoBench establishes a rigorous standard for evaluating misinformation resilience, advancing the development of more trustworthy AI systems.
2024
pdf
bib
abs
Enhancing Contrastive Learning with Noise-Guided Attack: Towards Continual Relation Extraction in the Wild
Ting Wu
|
Jingyi Liu
|
Rui Zheng
|
Tao Gui
|
Qi Zhang
|
Xuanjing Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The principle of continual relation extraction (CRE) involves adapting to emerging novel relations while preserving old knowledge. Existing CRE approaches excel in preserving old knowledge but falter when confronted with contaminated data streams, likely due to an artificial assumption of no annotation errors. Recognizing the prevalence of noisy labels in real-world datasets, we introduce a more practical learning scenario, termed as noisy-CRE. In response to this challenge, we propose a noise-resistant contrastive framework called Noise-guided Attack in Contrastive Learning (NaCL), aimed at learning incremental corrupted relations. Diverging from conventional approaches like sample discarding or relabeling in the presence of noisy labels, NaCL takes a transformative route by modifying the feature space through targeted attack. This attack aims to align the feature space with the provided, albeit inaccurate, labels, thereby enhancing contrastive representations. Extensive empirical validations demonstrate the consistent performance improvement of NaCL with increasing noise rates, surpassing state-of-the-art methods.
2022
pdf
bib
abs
Flooding-X: Improving BERT’s Resistance to Adversarial Attacks via Loss-Restricted Fine-Tuning
Qin Liu
|
Rui Zheng
|
Bao Rong
|
Jingyi Liu
|
ZhiHua Liu
|
Zhanzhan Cheng
|
Liang Qiao
|
Tao Gui
|
Qi Zhang
|
Xuanjing Huang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Adversarial robustness has attracted much attention recently, and the mainstream solution is adversarial training. However, the tradition of generating adversarial perturbations for each input embedding (in the settings of NLP) scales up the training computational complexity by the number of gradient steps it takes to obtain the adversarial samples. To address this problem, we leverage Flooding method which primarily aims at better generalization and we find promising in defending adversarial attacks. We further propose an effective criterion to bring hyper-parameter-dependent flooding into effect with a narrowed-down search space by measuring how the gradient steps taken within one epoch affect the loss of each batch. Our approach requires zero adversarial sample for training, and its time consumption is equivalent to fine-tuning, which can be 2-15 times faster than standard adversarial training. We experimentally show that our method improves BERT’s resistance to textual adversarial attacks by a large margin, and achieves state-of-the-art robust accuracy on various text classification and GLUE tasks.