Towards Multi-Modal Text-Image Retrieval to improve Human Reading

Florian Schneider, Özge Alaçam, Xintong Wang, Chris Biemann


Abstract
In primary school, children’s books, as well as in modern language learning apps, multi-modal learning strategies like illustrations of terms and phrases are used to support reading comprehension. Also, several studies in educational psychology suggest that integrating cross-modal information will improve reading comprehension. We claim that state-of- he-art multi-modal transformers, which could be used in a language learner context to improve human reading, will perform poorly because of the short and relatively simple textual data those models are trained with. To prove our hypotheses, we collected a new multi-modal image-retrieval dataset based on data from Wikipedia. In an in-depth data analysis, we highlight the differences between our dataset and other popular datasets. Additionally, we evaluate several state-of-the-art multi-modal transformers on text-image retrieval on our dataset and analyze their meager results, which verify our claims.
Anthology ID:
2021.naacl-srw.21
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Month:
June
Year:
2021
Address:
Online
Editors:
Esin Durmus, Vivek Gupta, Nelson Liu, Nanyun Peng, Yu Su
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
Language:
URL:
https://aclanthology.org/2021.naacl-srw.21
DOI:
Bibkey:
Cite (ACL):
Florian Schneider, Özge Alaçam, Xintong Wang, and Chris Biemann. 2021. Towards Multi-Modal Text-Image Retrieval to improve Human Reading. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, Online. Association for Computational Linguistics.
Cite (Informal):
Towards Multi-Modal Text-Image Retrieval to improve Human Reading (Schneider et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-srw.21.pdf
Video:
 https://aclanthology.org/2021.naacl-srw.21.mp4
Data
Conceptual CaptionsFlickr30kMS COCOWikiCaps