AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

Yun He; Wenzhe Li; Hejia Zhang; Songlin Li; Karishma Mandyam; Sopan Khosla; Yuanhao Xiong; Nanshu Wang; Xiaoliang Peng; Beibin Li; Shengjie Bi; Shishir G Patil; Qi Qi; Shengyu Feng; Julian Katz-Samuels; Richard Yuanzhe Pang; Sujan Kumar Gonugondla; Hunter Lang; Yue Yu; Yundi Qian; Maryam Fazel-Zarandi; Licheng Yu; Amine Benhalloum; Hany Hassan Awadalla; Manaal Faruqui

AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

Yun He, Wenzhe Li, Hejia Zhang, Songlin Li, Karishma Mandyam, Sopan Khosla, Yuanhao Xiong, Nanshu Wang, Xiaoliang Peng, Beibin Li, Shengjie Bi, Shishir G Patil, Qi Qi, Shengyu Feng, Julian Katz-Samuels, Richard Yuanzhe Pang, Sujan Kumar Gonugondla, Hunter Lang, Yue Yu, Yundi Qian, Maryam Fazel-Zarandi, Licheng Yu, Amine Benhalloum, Hany Hassan Awadalla, Manaal Faruqui

Abstract

Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)—especially for complex, multi-turn, and system-prompted instructions—remains a significant challenge. Rigorous evaluation and effective training for such capabilities are hindered by the lack of high-quality, human-annotated benchmarks and reliable, interpretable reward signals. In this work, we introduce AdvancedIF, a comprehensive benchmark featuring over 1,600 prompts and expert-curated rubrics that assess LLMs’ ability to follow complex, multi-turn, and system-level instructions. We also open-source the evaluation script of AdvancedIF. We further propose RIFL (Rubric-based Instruction-Following Learning), a novel post-training pipeline that leverages rubric generation, a finetuned rubric verifier, and reward shaping to enable effective reinforcement learning for instruction following. Extensive experiments demonstrate that RIFL substantially improves the instruction-following abilities of LLMs, achieving a 6.7% absolute gain on AdvancedIF and strong results on public benchmarks. Our ablation studies confirm the effectiveness of each component in RIFL. This work establishes rubrics as a powerful tool for both training and evaluating advanced IF in LLMs, paving the way for more capable and reliable AI systems.

Anthology ID:: 2026.acl-long.820
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18003–18022
Language:
URL:: https://aclanthology.org/2026.acl-long.820/
DOI:
Bibkey:
Cite (ACL):: Yun He, Wenzhe Li, Hejia Zhang, Songlin Li, Karishma Mandyam, Sopan Khosla, Yuanhao Xiong, Nanshu Wang, Xiaoliang Peng, Beibin Li, Shengjie Bi, Shishir G Patil, Qi Qi, Shengyu Feng, Julian Katz-Samuels, Richard Yuanzhe Pang, Sujan Kumar Gonugondla, Hunter Lang, Yue Yu, Yundi Qian, Maryam Fazel-Zarandi, Licheng Yu, Amine Benhalloum, Hany Hassan Awadalla, and Manaal Faruqui. 2026. AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18003–18022, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following (He et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.820.pdf
Checklist:: 2026.acl-long.820.checklist.pdf

PDF Cite Search Checklist Fix data