MultiSubs: A Large-scale Multimodal and Multilingual Dataset

Josiah Wang, Josiel Figueiredo, Lucia Specia


Abstract
This paper introduces a large-scale multimodal and multilingual dataset that aims to facilitate research on grounding words to images in their contextual usage in language. The dataset consists of images selected to unambiguously illustrate concepts expressed in sentences from movie subtitles. The dataset is a valuable resource as (i) the images are aligned to text fragments rather than whole sentences; (ii) multiple images are possible for a text fragment and a sentence; (iii) the sentences are free-form and real-world like; (iv) the parallel texts are multilingual. We also set up a fill-in-the-blank game for humans to evaluate the quality of the automatic image selection process of our dataset. Finally, we propose a fill-in-the-blank task to demonstrate the utility of the dataset, and present some baseline prediction models. The dataset will benefit research on visual grounding of words especially in the context of free-form sentences, and can be obtained from https://doi.org/10.5281/zenodo.5034604 under a Creative Commons licence.
Anthology ID:
2022.lrec-1.730
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6776–6785
Language:
URL:
https://aclanthology.org/2022.lrec-1.730
DOI:
Bibkey:
Cite (ACL):
Josiah Wang, Josiel Figueiredo, and Lucia Specia. 2022. MultiSubs: A Large-scale Multimodal and Multilingual Dataset. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6776–6785, Marseille, France. European Language Resources Association.
Cite (Informal):
MultiSubs: A Large-scale Multimodal and Multilingual Dataset (Wang et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.730.pdf
Code
 josiahwang/multisubs-eval
Data
MultiSubsOpenSubtitlesVisual Question Answering