Enriching Grammatical Error Correction Resources for Modern Greek

Katerina Korre, John Pavlopoulos


Abstract
Grammatical Error Correction (GEC), a task of Natural Language Processing (NLP), is challenging for underepresented languages. This issue is most prominent in languages other than English. This paper addresses the issue of data and system sparsity for GEC purposes in the modern Greek Language. Following the most popular current approaches in GEC, we develop and test an MT5 multilingual text-to-text transformer for Greek. To our knowledge this the first attempt to create a fully-fledged GEC model for Greek. Our evaluation shows that our system reaches up to 52.63% F0.5 score on part of the Greek Native Corpus (GNC), which is 16% below the winning system of the BEA-19 shared task on English GEC. In addition, we provide an extended version of the Greek Learner Corpus (GLC), on which our model reaches up to 22.76% F0.5. Previous versions did not include corrections with the annotations which hindered the potential development of efficient GEC systems. For that reason we provide a new set of corrections. This new dataset facilitates an exploration of the generalisation abilities and robustness of our system, given that the assessment is conducted on learner data while the training on native data.
Anthology ID:
2022.lrec-1.532
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4984–4991
Language:
URL:
https://aclanthology.org/2022.lrec-1.532
DOI:
Bibkey:
Cite (ACL):
Katerina Korre and John Pavlopoulos. 2022. Enriching Grammatical Error Correction Resources for Modern Greek. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4984–4991, Marseille, France. European Language Resources Association.
Cite (Informal):
Enriching Grammatical Error Correction Resources for Modern Greek (Korre & Pavlopoulos, LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.532.pdf