Sergey Porshnev


Eliminating Fuzzy Duplicates in Crowdsourced Lexical Resources
Yuri Kiselev | Dmitry Ustalov | Sergey Porshnev
Proceedings of the 8th Global WordNet Conference (GWC)

Collaboratively created lexical resources is a trending approach to creating high quality thesauri in a short time span at a remarkably low price. The key idea is to invite non-expert participants to express and share their knowledge with the aim of constructing a resource. However, this approach tends to be noisy and error-prone, thus making data cleansing a highly topical task to perform. In this paper, we study different techniques for synset deduplication including machine- and crowd-based ones. Eventually, we put forward an approach that can solve the deduplication problem fully automatically, with the quality comparable to the expert-based approach.