A Reassessment of Reference-Based Grammatical Error Correction Metrics

Shamil Chollampatt; Hwee Tou Ng

A Reassessment of Reference-Based Grammatical Error Correction Metrics

Abstract

Several metrics have been proposed for evaluating grammatical error correction (GEC) systems based on grammaticality, fluency, and adequacy of the output sentences. Previous studies of the correlation of these metrics with human quality judgments were inconclusive, due to the lack of appropriate significance tests, discrepancies in the methods, and choice of datasets used. In this paper, we re-evaluate reference-based GEC metrics by measuring the system-level correlations with humans on a large dataset of human judgments of GEC outputs, and by properly conducting statistical significance tests. Our results show no significant advantage of GLEU over MaxMatch (M2), contradicting previous studies that claim GLEU to be superior. For a finer-grained analysis, we additionally evaluate these metrics for their agreement with human judgments at the sentence level. Our sentence-level analysis indicates that comparing GLEU and M2, one metric may be more useful than the other depending on the scenario. We further qualitatively analyze these metrics and our findings show that apart from being less interpretable and non-deterministic, GLEU also produces counter-intuitive scores in commonly occurring test examples.

Anthology ID:: C18-1231
Volume:: Proceedings of the 27th International Conference on Computational Linguistics
Month:: August
Year:: 2018
Address:: Santa Fe, New Mexico, USA
Editors:: Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2730–2741
Language:
URL:: https://aclanthology.org/C18-1231/
DOI:
Bibkey:
Cite (ACL):: Shamil Chollampatt and Hwee Tou Ng. 2018. A Reassessment of Reference-Based Grammatical Error Correction Metrics. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2730–2741, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: A Reassessment of Reference-Based Grammatical Error Correction Metrics (Chollampatt & Ng, COLING 2018)
Copy Citation:
PDF:: https://aclanthology.org/C18-1231.pdf

PDF Cite Search Fix data