CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

Jingheng Ye; Zishan Xu; Yinghui Li; Linlin Song; Qingyu Zhou; Hai-Tao Zheng; Ying Shen; Wenhao Jiang; Hong-Gee Kim; Ruitong Liu; Xin Su; Zifei Shan

doi:10.18653/v1/2025.acl-long.10

CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

Jingheng Ye, Zishan Xu, Yinghui Li, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Wenhao Jiang, Hong-Gee Kim, Ruitong Liu, Xin Su, Zifei Shan

Abstract

The paper focuses on the interpretability of Grammatical Error Correction (GEC) evaluation metrics, which received little attention in previous studies. To bridge the gap, we introduce **CLEME2.0**, a reference-based metric describing four fundamental aspects of GEC systems: hit-correction, wrong-correction, under-correction, and over-correction. They collectively contribute to exposing critical qualities and locating drawbacks of GEC systems. Evaluating systems by combining these aspects also leads to superior human consistency over other reference-based and reference-less metrics. Extensive experiments on two human judgment datasets and six reference datasets demonstrate the effectiveness and robustness of our method, achieving a new state-of-the-art result. Our codes are released at https://github.com/THUKElab/CLEME.

Anthology ID:: 2025.acl-long.10
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 204–222
Language:
URL:: https://aclanthology.org/2025.acl-long.10/
DOI:: 10.18653/v1/2025.acl-long.10
Bibkey:
Cite (ACL):: Jingheng Ye, Zishan Xu, Yinghui Li, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Wenhao Jiang, Hong-Gee Kim, Ruitong Liu, Xin Su, and Zifei Shan. 2025. CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 204–222, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction (Ye et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.10.pdf

PDF Cite Search Fix data