An Unsupervised, Geometric and Syntax-aware Quantification of Polysemy

Anmol Goel, Charu Sharma, Ponnurangam Kumaraguru


Abstract
Polysemy is the phenomenon where a single word form possesses two or more related senses. It is an extremely ubiquitous part of natural language and analyzing it has sparked rich discussions in the linguistics, psychology and philosophy communities alike. With scarce attention paid to polysemy in computational linguistics, and even scarcer attention toward quantifying polysemy, in this paper, we propose a novel, unsupervised framework to compute and estimate polysemy scores for words in multiple languages. We infuse our proposed quantification with syntactic knowledge in the form of dependency structures. This informs the final polysemy scores of the lexicon motivated by recent linguistic findings that suggest there is an implicit relation between syntax and ambiguity/polysemy. We adopt a graph based approach by computing the discrete Ollivier Ricci curvature on a graph of the contextual nearest neighbors. We test our framework on curated datasets controlling for different sense distributions of words in 3 typologically diverse languages - English, French and Spanish. The effectiveness of our framework is demonstrated by significant correlations of our quantification with expert human annotated language resources like WordNet. We observe a 0.3 point increase in the correlation coefficient as compared to previous quantification studies in English. Our research leverages contextual language models and syntactic structures to empirically support the widely held theoretical linguistic notion that syntax is intricately linked to ambiguity/polysemy.
Anthology ID:
2022.emnlp-main.722
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10565–10574
Language:
URL:
https://aclanthology.org/2022.emnlp-main.722
DOI:
10.18653/v1/2022.emnlp-main.722
Bibkey:
Cite (ACL):
Anmol Goel, Charu Sharma, and Ponnurangam Kumaraguru. 2022. An Unsupervised, Geometric and Syntax-aware Quantification of Polysemy. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10565–10574, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
An Unsupervised, Geometric and Syntax-aware Quantification of Polysemy (Goel et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.722.pdf