Can Language Models Follow Multiple Turns of Entangled Instructions?

Chi Han; Xin Liu (刘鑫); Haodong Wang; Shiyang Li; Jingfeng Yang; Haoming Jiang; Zhengyang Wang; Qingyu Yin; Liang Qiu; Changlong Yu; Yifan Gao; Zheng Li; Bing Yin; Jingbo Shang; Heng Ji

Can Language Models Follow Multiple Turns of Entangled Instructions?

Chi Han, Xin Liu, Haodong Wang, Shiyang Li, Jingfeng Yang, Haoming Jiang, Zhengyang Wang, Qingyu Yin, Liang Qiu, Changlong Yu, Yifan Gao, Zheng Li, Bing Yin, Jingbo Shang, Heng Ji

Abstract

Despite of significant achievements in improving instruction-following capabilities of large language models (LLMs), the ability to process multiple potentially entangled or conflict instructions remains a considerable challenge. Real-world scenarios often require the consistency across multiple instructions over time, such as secret privacy, presonal preferences, and prioritization, so we demand sophisticated abilities to integrate multiple turns and carefully balance competing objectives when instructions intersect or conflict. This work presents a systematic investigation of LLMs’ capabilities in handling multiple turns of instructions, covering three levels of difficulty: (1) retrieving information from instructions, (2) tracking and reasoning across turns, and (3) resolving conflicts among instructions. We construct MultiTurnInstruct with 1.1K high-quality multi-turn conversations through the human-in-the-loop approach and result in a total of nine capability categories, including statics and dynamics, reasoning and multitasking. Our finding reveals an intriguing trade-off between different capabilities. While GPT models demonstrate superior memorization, they show reduced effectiveness in privacy-protection tasks requiring selective information withholding. Larger models exhibit stronger reasoning capabilities but still struggle with resolving conflicting instructions. Importantly, these performance gaps cannot be attributed solely to information loss, as models demonstrate strong BLEU scores on memorization tasks but their attention mechanisms fail to effectively integrate multiple related instructions. These findings highlight critical areas for improvement in the complex real-world tasks involving multi-turn instructions.

Anthology ID:: 2025.findings-emnlp.1387
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25445–25460
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1387/
DOI:
Bibkey:
Cite (ACL):: Chi Han, Xin Liu, Haodong Wang, Shiyang Li, Jingfeng Yang, Haoming Jiang, Zhengyang Wang, Qingyu Yin, Liang Qiu, Changlong Yu, Yifan Gao, Zheng Li, Bing Yin, Jingbo Shang, and Heng Ji. 2025. Can Language Models Follow Multiple Turns of Entangled Instructions?. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 25445–25460, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Can Language Models Follow Multiple Turns of Entangled Instructions? (Han et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1387.pdf
Checklist:: 2025.findings-emnlp.1387.checklist.pdf

PDF Cite Search Checklist Fix data