Unified Thinker: A General Reasoning Core for Image Generation

Sashuai Zhou; Qiang Zhou (周强); Jijin Hu; Hanqing Yang; Yue Cao; Junpeng Ma; Yinchao Ma; Jun Song; Tiezheng Ge; Cheng Yu; Bo Zheng; Zhou Zhao

Unified Thinker: A General Reasoning Core for Image Generation

Sashuai Zhou, Qiang Zhou, Jijin Hu, Hanqing Yang, Yue Cao, Junpeng Ma, Yinchao Ma, Jun Song, Tiezheng Ge, Cheng Yu, Bo Zheng, Zhou Zhao

Abstract

Despite impressive progress in high-fidelity image synthesis, generative models still struggle with logic-intensive instruction following, exposing a persistent reasoning–execution gap. Meanwhile, closed-source systems (e.g., Nano Banana) have demonstrated strong reasoning-driven image generation, highlighting a substantial gap to current open-source models. We argue that closing this gap requires not merely better visual generators, but executable reasoning: decomposing high-level intents into grounded, verifiable plans that directly steer the generative process. To this end, we propose Unified Thinker, a task-agnostic reasoning architecture for general image generation, designed as a unified planning core that can plug into diverse generators and workflows. Unified Thinker decouples a dedicated Thinker from the image Generator, enabling modular upgrades of reasoning without retraining the entire generative model. We further introduce a two-stage training paradigm: we first build a structured planning interface for the Thinker, then apply reinforcement learning to ground its policy in pixel-level feedback, encouraging plans that optimize visual correctness over textual plausibility. Extensive experiments on text-to-image generation and image editing show that Unified Thinker substantially improves image reasoning and generation quality.

Anthology ID:: 2026.acl-long.484
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10597–10613
Language:
URL:: https://aclanthology.org/2026.acl-long.484/
DOI:
Bibkey:
Cite (ACL):: Sashuai Zhou, Qiang Zhou, Jijin Hu, Hanqing Yang, Yue Cao, Junpeng Ma, Yinchao Ma, Jun Song, Tiezheng Ge, Cheng Yu, Bo Zheng, and Zhou Zhao. 2026. Unified Thinker: A General Reasoning Core for Image Generation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10597–10613, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Unified Thinker: A General Reasoning Core for Image Generation (Zhou et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.484.pdf
Checklist:: 2026.acl-long.484.checklist.pdf

PDF Cite Search Checklist Fix data