Angelina Bolshina
2022
Sense-Annotated Corpus for Russian
Alexander Kirillovich
|
Natalia Loukachevitch
|
Maksim Kulaev
|
Angelina Bolshina
|
Dmitry Ilvovsky
Proceedings of the Fifth International Conference on Computational Linguistics in Bulgaria (CLIB 2022)
We present a sense-annotated corpus for Russian. The resource was obtained my manually annotating texts from the OpenCorpora corpus, an open corpus for the Russian language, by senses of Russian wordnet RuWordNet. The annotation was used as a test collection for comparing unsupervised (Personalized Pagerank) and pseudo-labeling methods for Russian word sense disambiguation.
2020
Comparison of Genres in Word Sense Disambiguation using Automatically Generated Text Collections
Angelina Bolshina
|
Natalia Loukachevitch
Proceedings of the Fourth International Conference on Computational Linguistics in Bulgaria (CLIB 2020)
The best approaches in Word Sense Disambiguation (WSD) are supervised and rely on large amounts of hand-labelled data, which is not always available and costly to create. In our work we describe an approach that is used to create an automatically labelled collection based on the monosemous relatives (related unambiguous entries) for Russian. The main contribution of our work is that we extracted monosemous relatives that can be located at relatively long distances from a target ambiguous word and ranked them according to the similarity measure to the target sense. We evaluated word sense disambiguation models based on a nearest neighbour classification on BERT and ELMo embeddings and two text collections. Our work relies on the Russian wordnet RuWordNet.