Large Language Models are Good Annotators for Type-aware Data Augmentation in Grammatical Error Correction

Xinyuan Li, Yunshi Lan


Abstract
Large Language Models (LLMs) have achieved outstanding performance across various NLP tasks. Grammatical Error Correction (GEC) is a task aiming at automatically correcting grammatical errors in text, but it encounters a severe shortage of annotated data. Researchers have tried to make full use of the generalization capabilities of LLMs and prompt them to correct erroneous sentences, which however results in unexpected over-correction issues. In this paper, we rethink the role of LLMs in GEC tasks and propose a method, namely TypeDA, considering LLMs as the annotators for type-aware data augmentation in GEC tasks. Different from the existing data augmentation methods, our method prevents in-distribution corruption and is able to generate sentences with multi-granularity error types. Our experiments verify that our method can generally improve the GEC performance of different backbone models with only a small amount of augmented data. Further analyses verify the high consistency and diversity of the pseudo data generated via our method.
Anthology ID:
2025.coling-main.14
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
199–213
Language:
URL:
https://aclanthology.org/2025.coling-main.14/
DOI:
Bibkey:
Cite (ACL):
Xinyuan Li and Yunshi Lan. 2025. Large Language Models are Good Annotators for Type-aware Data Augmentation in Grammatical Error Correction. In Proceedings of the 31st International Conference on Computational Linguistics, pages 199–213, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Large Language Models are Good Annotators for Type-aware Data Augmentation in Grammatical Error Correction (Li & Lan, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.14.pdf