Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation

Zihao Li; Xuekong Xu; Ziyao Chen; Lixin Zou; Haijun Wu; Qiang Chen; Chenliang Li

doi:10.18653/v1/2025.findings-acl.823

Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation

Zihao Li, Xuekong Xu, Ziyao Chen, Lixin Zou, Haijun Wu, Qiang Chen, Chenliang Li

Abstract

Multi-style outline controllable generation is crucial for multiple applications, including document semantic structuring and retrieval-augmented generation.The great success of preference alignment approaches encourages their application in controllable generation tasks.However, these attempts encounter several limitations: (1) response pair requirements, (2) substantial computation costs, and (3) insufficient exploitation of fine-grained preference signals.To address these problems, we propose a token-level preference self-alignment optimization, named TKPO, for outline controllable generation. TKPO extends the Bradley-Terry model from pair-wise to list-wise comparison, which is further applied at the token level for fine-grained preference signal utilization. In comparison to the representative methods, e.g., DPO, TKPO does not require response pairs; instead, we propose a controllable attributes-driven method to construct reject samples for self-alignment. Additionally, TKPO optimizes only the base model, thereby avoiding additional memory usage and substantial computational costs.We curate two outline controllable generation datasets with regard to language style and level-of-detail.Extensive experiments demonstrate that TKPO outperforms DPO by up to 19.28% in performance while requiring only 56.25% in training time.We release the code and datasets resources at https://github.com/WHUIR/TKPO.

Anthology ID:: 2025.findings-acl.823
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15974–16007
Language:
URL:: https://aclanthology.org/2025.findings-acl.823/
DOI:: 10.18653/v1/2025.findings-acl.823
Bibkey:
Cite (ACL):: Zihao Li, Xuekong Xu, Ziyao Chen, Lixin Zou, Haijun Wu, Qiang Chen, and Chenliang Li. 2025. Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 15974–16007, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation (Li et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.823.pdf

PDF Cite Search Fix data