Adversarial Robustness for Large Language NER models using Disentanglement and Word Attributions

Xiaomeng Jin, Bhanukiran Vinzamuri, Sriram Venkatapathy, Heng Ji, Pradeep Natarajan


Abstract
Large language models (LLM’s) have been widely used for several applications such as question answering, text classification and clustering. While the preliminary results across the aforementioned tasks looks promising, recent work has dived deep into LLM’s performing poorly for complex Named Entity Recognition (NER) tasks in comparison to fine-tuned pre-trained language models (PLM’s). To enhance wider adoption of LLM’s, our paper investigates the robustness of such LLM NER models and its instruction fine-tuned variants to adversarial attacks. In particular, we propose a novel attack which relies on disentanglement and word attribution techniques where the former aids in learning an embedding capturing both entity and non-entity influences separately, and the latter aids in identifying important words across both components. This is in stark contrast to most techniques which primarily leverage non-entity words for perturbations limiting the space being explored to synthesize effective adversarial examples. Adversarial training results based on our method improves the F1 score over original LLM NER model by 8% and 18% on CoNLL-2003 and Ontonotes 5.0 datasets respectively.
Anthology ID:
2023.findings-emnlp.830
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12437–12450
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.830
DOI:
10.18653/v1/2023.findings-emnlp.830
Bibkey:
Cite (ACL):
Xiaomeng Jin, Bhanukiran Vinzamuri, Sriram Venkatapathy, Heng Ji, and Pradeep Natarajan. 2023. Adversarial Robustness for Large Language NER models using Disentanglement and Word Attributions. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12437–12450, Singapore. Association for Computational Linguistics.
Cite (Informal):
Adversarial Robustness for Large Language NER models using Disentanglement and Word Attributions (Jin et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.830.pdf