Jiaxin Mao
2024
Prompt Refinement with Image Pivot for Text-to-Image Generation
Jingtao Zhan
|
Qingyao Ai
|
Yiqun Liu
|
Yingwei Pan
|
Ting Yao
|
Jiaxin Mao
|
Shaoping Ma
|
Tao Mei
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from “user languages” into “system languages”. However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement model. Inspired by zero-shot machine translation techniques, we introduce Prompt Refinement with Image Pivot (PRIP). PRIP innovatively uses the latent representation of a user-preferred image as an intermediary “pivot” between the user and system languages. It decomposes the refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and subsequently translating image representations into system languages. Thus, it can leverage abundant data for training. Extensive experiments show that PRIP substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner.
Search
Co-authors
- Jingtao Zhan 1
- Qingyao Ai 1
- Yiqun Liu 1
- Yingwei Pan 1
- Ting Yao 1
- show all...
Venues
- acl1