UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning

Xiaolong Wei; Zerun Zhu; Simin Niu; Xingyu Zhang; Peiying Yu; Changxuan Xiao; Yuchen Li; Jicheng Yang; Zhejun Zhao; Chong Meng; Long Xia; Daiting Shi

UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning

Xiaolong Wei, Zerun Zhu, Simin Niu, Xingyu Zhang, Peiying Yu, Changxuan Xiao, Yuchen Li, Jicheng Yang, Zhejun Zhao, Chong Meng, Long Xia, Daiting Shi

Abstract

A fundamental challenge in creative writing lies in reconciling the inherent tension between maintaining global coherence in long-form narratives and preserving local expressiveness in short-form texts. While long-context generation necessitates explicit macroscopic planning, short-form creativity often demands spontaneous, constraint-free expression. Existing alignment paradigms, however, typically employ static reward signals and rely heavily on high-quality supervised data, which is costly and difficult to scale. To address this, we propose UniCreative, a unified reference-free reinforcement learning framework. We first introduce AC-GenRM, an adaptive constraint-aware reward model that dynamically synthesizes query-specific criteria to provide fine-grained preference judgments. Leveraging these signals, we propose ACPO, a policy optimization algorithm that aligns models with human preferences across both content quality and structural paradigms without supervised fine-tuning and ground-truth references. Empirical results demonstrate that AC-GenRM aligns closely with expert evaluations, while ACPO significantly enhances performance across diverse writing tasks. Crucially, our analysis reveals an emergent meta-cognitive ability: the model learns to autonomously differentiate between tasks requiring rigorous planning and those favoring direct generation, validating the effectiveness of our direct alignment approach.

Anthology ID:: 2026.findings-acl.1179
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23563–23583
Language:
URL:: https://aclanthology.org/2026.findings-acl.1179/
DOI:
Bibkey:
Cite (ACL):: Xiaolong Wei, Zerun Zhu, Simin Niu, Xingyu Zhang, Peiying Yu, Changxuan Xiao, Yuchen Li, Jicheng Yang, Zhejun Zhao, Chong Meng, Long Xia, and Daiting Shi. 2026. UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 23563–23583, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning (Wei et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1179.pdf
Checklist:: 2026.findings-acl.1179.checklist.pdf

PDF Cite Search Checklist Fix data