Denis Teslenko
2020
Word Sense Disambiguation for 158 Languages using Word Embeddings Only
Varvara Logacheva
|
Denis Teslenko
|
Artem Shelmanov
|
Steffen Remus
|
Dmitry Ustalov
|
Andrey Kutuzov
|
Ekaterina Artemova
|
Chris Biemann
|
Simone Paolo Ponzetto
|
Alexander Panchenko
Proceedings of the Twelfth Language Resources and Evaluation Conference
Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al., (2018), enabling WSD in these languages. Models and system are available online.
2018
An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
Dmitry Ustalov
|
Denis Teslenko
|
Alexander Panchenko
|
Mikhail Chernoskutov
|
Chris Biemann
|
Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)