Enhancing Scene Transition Awareness in Video Generation via Post-Training

Hanwen Shen; Jiajie Lu; Yupeng Cao; Xiaonan Yang

Enhancing Scene Transition Awareness in Video Generation via Post-Training

Hanwen Shen, Jiajie Lu, Yupeng Cao, Xiaonan Yang

Abstract

Recent advances in AI-generated video have shown strong performance on text-to-video tasks, particularly for short clips depicting a single scene. However, current models struggle to generate longer videos with coherent scene transitions, primarily because they cannot infer when a transition is needed from the prompt. Most open-source models are trained on datasets consisting of single-scene video clips, which limits their capacity to learn and respond to prompts requiring multiple scenes. Developing scene transition awareness is essential for multi-scene generation, as it allows models to identify and segment videos into distinct clips by accurately detecting transitions. To address this, we introduce the Transition-Aware Video (TAV) dataset with multi-scene clips and captions that explicitly state scene segmentation and transition structure. Our focus is on how prompt semantics and dataset annotations about temporal context affect text-to-video generation. Post-training on TAV improves alignment between the scene count implied by prompt and the scene count produced by the model, while preserving visual quality.

Anthology ID:: 2025.findings-ijcnlp.41
Volume:: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venue:: Findings
SIG:
Publisher:: The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:: 706–721
Language:
URL:: https://aclanthology.org/2025.findings-ijcnlp.41/
DOI:
Bibkey:
Cite (ACL):: Hanwen Shen, Jiajie Lu, Yupeng Cao, and Xiaonan Yang. 2025. Enhancing Scene Transition Awareness in Video Generation via Post-Training. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 706–721, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):: Enhancing Scene Transition Awareness in Video Generation via Post-Training (Shen et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-ijcnlp.41.pdf

PDF Cite Search Fix data