Individuation in Neural Models with and without Visual Grounding

Alexey Tikhonov, Lisa Bylinina, Ivan P. Yamshchikov


Abstract
We show differences between a language-and-vision model CLIP and two text-only models — FastText and SBERT — when it comes to the encoding of individuation information. We study latent representations that CLIP provides for substrates, granular aggregates, and various numbers of objects. We demonstrate that CLIP embeddings capture quantitative differences in individuation better than models trained on text-only data. Moreover, the individuation hierarchy we deduce from the CLIP embeddings agrees with the hierarchies proposed in linguistics and cognitive science.
Anthology ID:
2024.nlp4science-1.21
Volume:
Proceedings of the 1st Workshop on NLP for Science (NLP4Science)
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Lotem Peled-Cohen, Nitay Calderon, Shir Lissak, Roi Reichart
Venue:
NLP4Science
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
240–248
Language:
URL:
https://aclanthology.org/2024.nlp4science-1.21
DOI:
Bibkey:
Cite (ACL):
Alexey Tikhonov, Lisa Bylinina, and Ivan P. Yamshchikov. 2024. Individuation in Neural Models with and without Visual Grounding. In Proceedings of the 1st Workshop on NLP for Science (NLP4Science), pages 240–248, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
Individuation in Neural Models with and without Visual Grounding (Tikhonov et al., NLP4Science 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4science-1.21.pdf