Yuri Kiselev

2016

Eliminating Fuzzy Duplicates in Crowdsourced Lexical Resources
Yuri Kiselev | Dmitry Ustalov | Sergey Porshnev
Proceedings of the 8th Global WordNet Conference (GWC)

Collaboratively created lexical resources is a trending approach to creating high quality thesauri in a short time span at a remarkably low price. The key idea is to invite non-expert participants to express and share their knowledge with the aim of constructing a resource. However, this approach tends to be noisy and error-prone, thus making data cleansing a highly topical task to perform. In this paper, we study different techniques for synset deduplication including machine- and crowd-based ones. Eventually, we put forward an approach that can solve the deduplication problem fully automatically, with the quality comparable to the expert-based approach.

pdf bib abs

YARN: Spinning-in-Progress
Pavel Braslavski | Dmitry Ustalov | Mikhail Mukhin | Yuri Kiselev
Proceedings of the 8th Global WordNet Conference (GWC)

YARN (Yet Another RussNet), a project started in 2013, aims at creating a large open WordNet-like thesaurus for Russian by means of crowdsourcing. The first stage of the project was to create noun synsets. Currently, the resource comprises 48K+ word entries and 44K+ synsets. More than 200 people have taken part in assembling synsets throughout the project. The paper describes the linguistic, technical, and organizational principles of the project, as well as the evaluation results, lessons learned, and the future plans.

Co-authors

Venues

GWC2

Fix author