Three Studies on Predicting Word Concreteness with Embedding Vectors

Michael Flor


Abstract
Human-assigned concreteness ratings for words are commonly used in psycholinguistic and computational linguistic studies. Previous research has shown that such ratings can be modeled and extrapolated by using dense word-embedding representations. However, due to rater disagreement, considerable amounts of human ratings in published datasets are not reliable. We investigate how such unreliable data influences modeling of concreteness with word embeddings. Study 1 compares fourteen embedding models over three datasets of concreteness ratings, showing that most models achieve high correlations with human ratings, and exhibit low error rates on predictions. Study 2 investigates how exclusion of the less reliable ratings influences the modeling results. It indicates that improved results can be achieved when data is cleaned. Study 3 adds additional conditions over those of study 2 and indicates that the improved results hold only for the cleaned data, and that in the general case removing the less reliable data points is not useful.
Anthology ID:
2024.cogalex-1.17
Volume:
Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Michael Zock, Emmanuele Chersoni, Yu-Yin Hsu, Simon de Deyne
Venue:
CogALex
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
140–150
Language:
URL:
https://aclanthology.org/2024.cogalex-1.17
DOI:
Bibkey:
Cite (ACL):
Michael Flor. 2024. Three Studies on Predicting Word Concreteness with Embedding Vectors. In Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024, pages 140–150, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Three Studies on Predicting Word Concreteness with Embedding Vectors (Flor, CogALex 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.cogalex-1.17.pdf