Vulnerabilities of Large Language Models to Adversarial Attacks

Yu Fu, Erfan Shayegan, Md. Mamun Al Abdullah, Pedram Zaree, Nael Abu-Ghazaleh, Yue Dong


Abstract
This tutorial serves as a comprehensive guide on the vulnerabilities of Large Language Models (LLMs) to adversarial attacks, an interdisciplinary field that blends perspectives from Natural Language Processing (NLP) and Cybersecurity. As LLMs become more complex and integrated into various systems, understanding their security attributes is crucial. However, current research indicates that even safety-aligned models are not impervious to adversarial attacks that can result in incorrect or harmful outputs. The tutorial first lays the foundation by explaining safety-aligned LLMs and concepts in cybersecurity. It then categorizes existing research based on different types of learning architectures and attack methods. We highlight the existing vulnerabilities of unimodal LLMs, multi-modal LLMs, and systems that integrate LLMs, focusing on adversarial attacks designed to exploit weaknesses and mislead AI systems. Finally, the tutorial delves into the potential causes of these vulnerabilities and discusses potential defense mechanisms.
Anthology ID:
2024.acl-tutorials.5
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 5: Tutorial Abstracts)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Luis Chiruzzo, Hung-yi Lee, Leonardo Ribeiro
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8–9
Language:
URL:
https://aclanthology.org/2024.acl-tutorials.5
DOI:
Bibkey:
Cite (ACL):
Yu Fu, Erfan Shayegan, Md. Mamun Al Abdullah, Pedram Zaree, Nael Abu-Ghazaleh, and Yue Dong. 2024. Vulnerabilities of Large Language Models to Adversarial Attacks. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 5: Tutorial Abstracts), pages 8–9, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Vulnerabilities of Large Language Models to Adversarial Attacks (Fu et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-tutorials.5.pdf