Evaluating Semantic Relations in Predicting Textual Labels for Images of Abstract and Concrete Concepts

Tarun Tater, Sabine Schulte Im Walde, Diego Frassinelli


Abstract
This study investigates the performance of SigLIP, a state-of-the-art Vision-Language Model (VLM), in predicting labels for images depicting 1,278 concepts. Our analysis across 300 images per concept shows that the model frequently predicts the exact user-tagged labels, but similarly, it often predicts labels that are semantically related to the exact labels in various ways: synonyms, hypernyms, co-hyponyms, and associated words, particularly for abstract concepts. We then zoom into the diversity of the user tags of images and word associations for abstract versus concrete concepts. Surprisingly, not only abstract but also concrete concepts exhibit significant variability, thus challenging the traditional view that representations of concrete concepts are less diverse.
Anthology ID:
2024.cmcl-1.18
Volume:
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Tatsuki Kuribayashi, Giulia Rambelli, Ece Takmaz, Philipp Wicke, Yohei Oseki
Venues:
CMCL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
214–220
Language:
URL:
https://aclanthology.org/2024.cmcl-1.18
DOI:
Bibkey:
Cite (ACL):
Tarun Tater, Sabine Schulte Im Walde, and Diego Frassinelli. 2024. Evaluating Semantic Relations in Predicting Textual Labels for Images of Abstract and Concrete Concepts. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 214–220, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Evaluating Semantic Relations in Predicting Textual Labels for Images of Abstract and Concrete Concepts (Tater et al., CMCL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.cmcl-1.18.pdf