ERRANT: Assessing and Improving Grammatical Error Type Classification

Katerina Korre, John Pavlopoulos


Abstract
Grammatical Error Correction (GEC) is the task of correcting different types of errors in written texts. To manage this task, large amounts of annotated data that contain erroneous sentences are required. This data, however, is usually annotated according to each annotator’s standards, making it difficult to manage multiple sets of data at the same time. The recently introduced Error Annotation Toolkit (ERRANT) tackled this problem by presenting a way to automatically annotate data that contain grammatical errors, while also providing a standardisation for annotation. ERRANT extracts the errors and classifies them into error types, in the form of an edit that can be used in the creation of GEC systems, as well as for grammatical error analysis. However, we observe that certain errors are falsely or ambiguously classified. This could obstruct any qualitative or quantitative grammatical error type analysis, as the results would be inaccurate. In this work, we use a sample of the FCE coprus (Yannakoudakis et al., 2011) for secondary error type annotation and we show that up to 39% of the annotations of the most frequent type should be re-classified. Our corrections will be publicly released, so that they can serve as the starting point of a broader, collaborative, ongoing correction process.
Anthology ID:
2020.latechclfl-1.10
Volume:
Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
December
Year:
2020
Address:
Online
Venues:
CLFL | COLING | LaTeCH | LaTeCHCLfL
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
85–89
Language:
URL:
https://aclanthology.org/2020.latechclfl-1.10
DOI:
Bibkey:
Cite (ACL):
Katerina Korre and John Pavlopoulos. 2020. ERRANT: Assessing and Improving Grammatical Error Type Classification. In Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 85–89, Online. International Committee on Computational Linguistics.
Cite (Informal):
ERRANT: Assessing and Improving Grammatical Error Type Classification (Korre & Pavlopoulos, LaTeCHCLfL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.latechclfl-1.10.pdf
Data
FCE