Attractive Storyteller: Stylized Visual Storytelling with Unpaired Text

Dingyi Yang, Qin Jin


Abstract
Most research on stylized image captioning aims to generate style-specific captions using unpaired text, and has achieved impressive performance for simple styles like positive and negative. However, unlike previous single-sentence captions whose style is mostly embodied in distinctive words or phrases, real-world styles are likely to be implied at the syntactic and discourse levels. In this work, we introduce a new task of Stylized Visual Storytelling (SVST), which aims to describe a photo stream with stylized stories that are more expressive and attractive. We propose a multitasking memory-augmented framework called StyleVSG, which is jointly trained on factual visual storytelling data and unpaired style corpus, achieving a trade-off between style accuracy and visual relevance. Particularly for unpaired stylized text, StyleVSG learns to reconstruct the stylistic story from roughly parallel visual inputs mined with the CLIP model, avoiding problems caused by random mapping in previous methods. Furthermore, a memory module is designed to preserve the consistency and coherence of generated stories. Experiments show that our method can generate attractive and coherent stories with different styles such as fairy tale, romance, and humor. The overall performance of our StyleVSG surpasses state-of-the-art methods on both automatic and human evaluation metrics.
Anthology ID:
2023.acl-long.619
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11053–11066
Language:
URL:
https://aclanthology.org/2023.acl-long.619
DOI:
10.18653/v1/2023.acl-long.619
Bibkey:
Cite (ACL):
Dingyi Yang and Qin Jin. 2023. Attractive Storyteller: Stylized Visual Storytelling with Unpaired Text. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11053–11066, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Attractive Storyteller: Stylized Visual Storytelling with Unpaired Text (Yang & Jin, ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.619.pdf
Video:
 https://aclanthology.org/2023.acl-long.619.mp4