Summary of the Visually Grounded Story Generation Challenge

Xudong Hong, Asad Sayeed, Vera Demberg


Abstract
Recent advancements in vision-and-language models have opened new possibilities for natural language generation, particularly in generating creative stories from visual input. We thus host an open-sourced shared task, Visually Grounded Story Generation (VGSG), to explore whether these models can create coherent, diverse, and visually grounded narratives. This task challenges participants to generate coherent stories based on sequences of images, where characters and events must be grounded in the images provided. The task is structured into two tracks: the Closed track with constraints on fixed visual features and the Open track which allows all kinds of models. We propose the first two-stage model using GPT-4o as the baseline for the Open track that first generates descriptions for the images and then creates a story based on those descriptions. Human and automatic evaluations indicate that: 1) Retrieval augmentation helps generate more human-like stories, and 2) Largescale pre-trained LLM improves story quality by a large margin; 3) Traditional automatic metrics can not capture the overall quality.
Anthology ID:
2024.inlg-genchal.3
Volume:
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges
Month:
September
Year:
2024
Address:
Tokyo, Japan
Editors:
Simon Mille, Miruna-Adriana Clinciu
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
39–46
Language:
URL:
https://aclanthology.org/2024.inlg-genchal.3
DOI:
Bibkey:
Cite (ACL):
Xudong Hong, Asad Sayeed, and Vera Demberg. 2024. Summary of the Visually Grounded Story Generation Challenge. In Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges, pages 39–46, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Summary of the Visually Grounded Story Generation Challenge (Hong et al., INLG 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.inlg-genchal.3.pdf