Multilingual Substitution-based Word Sense Induction

Denis Kokosinskii, Nikolay Arefyev


Abstract
Word Sense Induction (WSI) is the task of discovering senses of an ambiguous word by grouping usages of this word into clusters corresponding to these senses. Many approaches were proposed to solve WSI in English and a few other languages, but these approaches are not easily adaptable to new languages. We present multilingual substitution-based WSI methods that support any of 100 languages covered by the underlying multilingual language model with minimal to no adaptation required. Despite the multilingual capabilities, our methods perform on par with the existing monolingual approaches on popular English WSI datasets. At the same time, they will be most useful for lower-resourced languages which miss lexical resources available for English, thus, have higher demand for unsupervised methods like WSI.
Anthology ID:
2024.lrec-main.1035
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
11859–11872
Language:
URL:
https://aclanthology.org/2024.lrec-main.1035
DOI:
Bibkey:
Cite (ACL):
Denis Kokosinskii and Nikolay Arefyev. 2024. Multilingual Substitution-based Word Sense Induction. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11859–11872, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Multilingual Substitution-based Word Sense Induction (Kokosinskii & Arefyev, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1035.pdf