The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models

Zihui Wu, Haichang Gao, Jianping He, Ping Wang


Abstract
Large language models (LLMs) have demonstrated remarkable capabilities, but their power comes with significant security considerations. While extensive research has been conducted on the safety of LLMs in chat mode, the security implications of their function calling feature have been largely overlooked. This paper uncovers a critical vulnerability in the function calling process of LLMs, introducing a novel “jailbreak function” attack method that exploits alignment discrepancies, user coercion, and the absence of rigorous safety filters. Our empirical study, conducted on six state-of-the-art LLMs including GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-pro, reveals an alarming average success rate of over 90% for this attack. We provide a comprehensive analysis of why function calls are susceptible to such attacks and propose defensive strategies, including the use of defensive prompts. Our findings highlight the urgent need for enhanced security measures in the function calling capabilities of LLMs, contributing to the field of AI safety by identifying a previously unexplored risk, designing an effective attack method, and suggesting practical defensive measures
Anthology ID:
2025.coling-main.39
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
584–592
Language:
URL:
https://aclanthology.org/2025.coling-main.39/
DOI:
Bibkey:
Cite (ACL):
Zihui Wu, Haichang Gao, Jianping He, and Ping Wang. 2025. The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models. In Proceedings of the 31st International Conference on Computational Linguistics, pages 584–592, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models (Wu et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.39.pdf