VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition

Hongbo Jin; Kuanwei Lin; Wenhao Zhang; Yichen Jin; Ge Li

VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition

Hongbo Jin, Kuanwei Lin, Wenhao Zhang, Yichen Jin, Ge Li

Abstract

Reinforcement Learning (RL) is crucial for empowering Video-LLMs with complex spatiotemporal reasoning. However, current RL paradigms predominantly rely on random data shuffling or naive curriculum strategies based on scalar difficulty metrics. We argue that scalar metrics fail to disentangle two orthogonal challenges in video understanding: Visual-Temporal Perception Load and Cognitive Reasoning Depth. To address this, we propose VideoCuRL, a novel framework that decomposes difficulty into these two axes. We employ efficient, training-free proxies—optical flow/keyframe entropy for visual complexity and Calibrated Surprisal for cognitive complexity—to map data onto a 2D curriculum grid. A competence-aware Diagonal Wavefront strategy then schedules training from base alignment to complex reasoning. Furthermore, we introduce Dynamic Sparse KL and Structured Revisiting to stabilize training against reward collapse and catastrophic forgetting. Extensive experiments show that VideoCuRL surpasses strong RL baselines on reasoning (+2.5% on VSI-Bench) and perception (+2.9% on VideoMME) tasks. Notably, VideoCuRL eliminates the prohibitive inference overhead of generation-based curricula, offering a scalable solution for robust video post-training.

Anthology ID:: 2026.acl-long.953
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20832–20845
Language:
URL:: https://aclanthology.org/2026.acl-long.953/
DOI:
Bibkey:
Cite (ACL):: Hongbo Jin, Kuanwei Lin, Wenhao Zhang, Yichen Jin, and Ge Li. 2026. VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 20832–20845, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition (Jin et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.953.pdf
Checklist:: 2026.acl-long.953.checklist.pdf

PDF Cite Search Checklist Fix data