AGRR 2019: Corpus for Gapping Resolution in Russian

Maria Ponomareva, Kira Droganova, Ivan Smurov, Tatiana Shavrina


Abstract
This paper provides a comprehensive overview of the gapping dataset for Russian that consists of 7.5k sentences with gapping (as well as 15k relevant negative sentences) and comprises data from various genres: news, fiction, social media and technical texts. The dataset was prepared for the Automatic Gapping Resolution Shared Task for Russian (AGRR-2019) - a competition aimed at stimulating the development of NLP tools and methods for processing of ellipsis. In this paper, we pay special attention to the gapping resolution methods that were introduced within the shared task as well as an alternative test set that illustrates that our corpus is a diverse and representative subset of Russian language gapping sufficient for effective utilization of machine learning techniques.
Anthology ID:
W19-3705
Volume:
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Tomaž Erjavec, Michał Marcińczuk, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:
BSNLP
SIG:
SIGSLAV
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–43
Language:
URL:
https://aclanthology.org/W19-3705
DOI:
10.18653/v1/W19-3705
Bibkey:
Cite (ACL):
Maria Ponomareva, Kira Droganova, Ivan Smurov, and Tatiana Shavrina. 2019. AGRR 2019: Corpus for Gapping Resolution in Russian. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 35–43, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
AGRR 2019: Corpus for Gapping Resolution in Russian (Ponomareva et al., BSNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-3705.pdf