Oleksiy Syvokon


2023

pdf bib
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Oleksiy Syvokon | Olena Nahorna | Pavlo Kuchmiichuk | Nastasiia Osidach
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)

We present a corpus professionally annotated for grammatical error correction (GEC) and fluency edits in the Ukrainian language. We have built two versions of the corpus – GEC+Fluency and GEC-only – to differentiate the corpus application. To the best of our knowledge, this is the first GEC corpus for the Ukrainian language. We collected texts with errors (33,735 sentences) from a diverse pool of contributors, including both native and non-native speakers. The data cover a wide variety of writing domains, from text chats and essays to formal writing. Professional proofreaders corrected and annotated the corpus for errors relating to fluency, grammar, punctuation, and spelling. This corpus can be used for developing and evaluating GEC systems in Ukrainian. More generally, it can be used for researching multilingual and low-resource NLP, morphologically rich languages, document-level GEC, and fluency correction. The corpus is publicly available at https://github.com/grammarly/ua-gec

pdf bib
The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian
Oleksiy Syvokon | Mariana Romanyshyn
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)

This paper presents the results of the UNLP 2023 shared task, the first Shared Task on Grammatical Error Correction for the Ukrainian language. The task included two tracks: GEC-only and GEC+Fluency. The dataset and evaluation scripts were provided to the participants, and the final results were evaluated on a hidden test set. Six teams submitted their solutions before the deadline, and four teams submitted papers that were accepted to appear in the UNLP workshop proceedings and are referred to in this report. The CodaLab leaderboard is left open for further submissions.