Improved Visual Story Generation with Adaptive Context Modeling

Zhangyin Feng; Yuchen Ren; Xinmiao Yu; Xiaocheng Feng; Duyu Tang; Shuming Shi; Bing Qin (秦兵)

doi:10.18653/v1/2023.findings-acl.305

Improved Visual Story Generation with Adaptive Context Modeling

Zhangyin Feng, Yuchen Ren, Xinmiao Yu, Xiaocheng Feng, Duyu Tang, Shuming Shi, Bing Qin

Abstract

Diffusion models developed on top of powerful text-to-image generation models like Stable Diffusion achieve remarkable success in visual story generation. However, the best-performing approach considers historically generated results as flattened memory cells, ignoring the fact that not all preceding images contribute equally to the generation of the characters and scenes at the current stage. To address this, we present a simple method that improves the leading system with adaptive context modeling, which is not only incorporated in the encoder but also adopted as additional guidance in the sampling stage to boost the global consistency of the generated story. We evaluate our model on PororoSV and FlintstonesSV datasets and show that our approach achieves state-of-the-art FID scores on both story visualization and continuation scenarios. We conduct detailed model analysis and show that our model excels at generating semantically consistent images for stories.

Anthology ID:: 2023.findings-acl.305
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4939–4955
Language:
URL:: https://aclanthology.org/2023.findings-acl.305/
DOI:: 10.18653/v1/2023.findings-acl.305
Bibkey:
Cite (ACL):: Zhangyin Feng, Yuchen Ren, Xinmiao Yu, Xiaocheng Feng, Duyu Tang, Shuming Shi, and Bing Qin. 2023. Improved Visual Story Generation with Adaptive Context Modeling. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4939–4955, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Improved Visual Story Generation with Adaptive Context Modeling (Feng et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.305.pdf

PDF Cite Search Fix data