Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets

Jack Hessel, David Mimno, Lillian Lee


Abstract
Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multimodal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multimodal research.
Anthology ID:
N18-1199
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2194–2205
Language:
URL:
https://aclanthology.org/N18-1199
DOI:
10.18653/v1/N18-1199
Bibkey:
Cite (ACL):
Jack Hessel, David Mimno, and Lillian Lee. 2018. Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2194–2205, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets (Hessel et al., NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-1199.pdf
Note:
 N18-1199.Notes.pdf
Code
 victorssilva/concreteness
Data
Flickr30kImageNetMS COCO