Differential Robustness in Transformer Language Models: Empirical Evaluation under Adversarial Text Attacks

Taniya Gidatkar, Oluwaseun Ajao, Matthew Shardlow


Abstract
This study evaluates the resilience of large language models (LLMs) against adversarial attacks, specifically focusing on Flan-T5, BERT, and RoBERTa-Base. Using systematically designed adversarial tests through TextFooler and BERTAttack, we found significant variations in model robustness. RoBERTa-Base and Flan-T5 demonstrated remarkable resilience, maintaining accuracy even when subjected to sophisticated attacks, with attack success rates of 0%. In contrast, BERT-Base showed considerable vulnerability, with TextFooler achieving a 93.75% success rate in reducing model accuracy from 48% to just 3%. Our research reveals that while certain LLMs have developed effective defensive mechanisms, these safeguards often require substantial computational resources. This study contributes to the understanding of LLM security by identifying existing strengths and weaknesses in current safeguarding approaches and proposes practical recommendations for developing more efficient and effective defensive strategies
Anthology ID:
2025.ranlp-1.48
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
395–402
Language:
URL:
https://aclanthology.org/2025.ranlp-1.48/
DOI:
Bibkey:
Cite (ACL):
Taniya Gidatkar, Oluwaseun Ajao, and Matthew Shardlow. 2025. Differential Robustness in Transformer Language Models: Empirical Evaluation under Adversarial Text Attacks. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 395–402, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Differential Robustness in Transformer Language Models: Empirical Evaluation under Adversarial Text Attacks (Gidatkar et al., RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.48.pdf