Incorporating Textual Evidence in Visual Storytelling

Tianyi Li, Sujian Li


Abstract
Previous work on visual storytelling mainly focused on exploring image sequence as evidence for storytelling and neglected textual evidence for guiding story generation. Motivated by human storytelling process which recalls stories for familiar images, we exploit textual evidence from similar images to help generate coherent and meaningful stories. To pick the images which may provide textual experience, we propose a two-step ranking method based on image object recognition techniques. To utilize textual information, we design an extended Seq2Seq model with two-channel encoder and attention. Experiments on the VIST dataset show that our method outperforms state-of-the-art baseline models without heavy engineering.
Anthology ID:
W19-8102
Volume:
Proceedings of the 1st Workshop on Discourse Structure in Neural NLG
Month:
November
Year:
2019
Address:
Tokyo, Japan
Editors:
Anusha Balakrishnan, Vera Demberg, Chandra Khatri, Abhinav Rastogi, Donia Scott, Marilyn Walker, Michael White
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
13–17
Language:
URL:
https://aclanthology.org/W19-8102
DOI:
10.18653/v1/W19-8102
Bibkey:
Cite (ACL):
Tianyi Li and Sujian Li. 2019. Incorporating Textual Evidence in Visual Storytelling. In Proceedings of the 1st Workshop on Discourse Structure in Neural NLG, pages 13–17, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Incorporating Textual Evidence in Visual Storytelling (Li & Li, INLG 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-8102.pdf
Data
VIST