Jingyi Liu

2025

Large Language Models (LLMs) excel in various Natural Language Processing (NLP) tasks but remain vulnerable to misinformation, particularly in multi-turn dialogues where misleading context accumulates. Existing benchmarks, such as TruthfulQA and FEVER, assess factual accuracy in isolated queries but fail to evaluate LLMs’ resilience to misinformation in interactive settings. To address this limitation, we introduce MisinfoBench, a multi-dimensional benchmark designed to assess LLMs’ ability to discern, resist, and reject misinformation. MisinfoBench defines three core dimensions—Discernment, Resistance, and Principled Refusal—across seven evaluation tasks, systematically testing misinformation identification, contextual resistance, and the rejection of coercive false premises. It includes a dataset of 4,962 multi-turn dialogues and 2,000 misinformation-based question-answer pairs, capturing diverse misinformation scenarios. We evaluate 16 LLMs, revealing substantial disparities in misinformation resilience: proprietary models outperform open-source counterparts, while multi-turn dialogues and cross-lingual settings exacerbate misinformation susceptibility. Our findings highlight persistent vulnerabilities in LLMs’ misinformation defenses, emphasizing the need for context-aware training, adversarial robustness, and principled reasoning. MisinfoBench establishes a rigorous standard for evaluating misinformation resilience, advancing the development of more trustworthy AI systems.

2024

pdf bib abs
Enhancing Contrastive Learning with Noise-Guided Attack: Towards Continual Relation Extraction in the Wild
Ting Wu | Jingyi Liu | Rui Zheng | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The principle of continual relation extraction (CRE) involves adapting to emerging novel relations while preserving old knowledge. Existing CRE approaches excel in preserving old knowledge but falter when confronted with contaminated data streams, likely due to an artificial assumption of no annotation errors. Recognizing the prevalence of noisy labels in real-world datasets, we introduce a more practical learning scenario, termed as noisy-CRE. In response to this challenge, we propose a noise-resistant contrastive framework called Noise-guided Attack in Contrastive Learning (NaCL), aimed at learning incremental corrupted relations. Diverging from conventional approaches like sample discarding or relabeling in the presence of noisy labels, NaCL takes a transformative route by modifying the feature space through targeted attack. This attack aims to align the feature space with the provided, albeit inaccurate, labels, thereby enhancing contrastive representations. Extensive empirical validations demonstrate the consistent performance improvement of NaCL with increasing noise rates, surpassing state-of-the-art methods.

2022

Adversarial robustness has attracted much attention recently, and the mainstream solution is adversarial training. However, the tradition of generating adversarial perturbations for each input embedding (in the settings of NLP) scales up the training computational complexity by the number of gradient steps it takes to obtain the adversarial samples. To address this problem, we leverage Flooding method which primarily aims at better generalization and we find promising in defending adversarial attacks. We further propose an effective criterion to bring hyper-parameter-dependent flooding into effect with a narrowed-down search space by measuring how the gradient steps taken within one epoch affect the loss of each batch. Our approach requires zero adversarial sample for training, and its time consumption is equivalent to fine-tuning, which can be 2-15 times faster than standard adversarial training. We experimentally show that our method improves BERT’s resistance to textual adversarial attacks by a large margin, and achieves state-of-the-art robust accuracy on various text classification and GLUE tasks.

Co-authors

Qin Liu 1

Li Sun 1

Ting Wu 1

Ye Yang 1

Qingyu Yang 1

Venues

acl2
findings1

Fix author