Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs Xiangwen Wang author Jie Peng author Kaidi Xu author Huaxiu Yao author Tianlong Chen author 2024-08 text Proceedings of the Fifth Workshop on Privacy in Natural Language Processing Ivan Habernal editor Sepideh Ghanavati editor Abhilasha Ravichander editor Vijayanta Jain editor Patricia Thaine editor Timour Igamberdiev editor Niloofar Mireshghallah editor Oluwaseyi Feyisetan editor Association for Computational Linguistics Bangkok, Thailand conference publication wang-etal-2024-reinforcement-learning https://aclanthology.org/2024.privatenlp-1.17/ 2024-08 170 177