InstructGEC: Enhancing Unsupervised Grammatical Error Correction with Instruction Tuning

Jiayi Deng, Chen Chen, Chunyan Hou, Xiaojie Yuan


Abstract
Recent works have proposed methods of generating synthetic data automatically for unsupervised Grammatical Error Correction (GEC). Although a large amount of synthetic data is generated at a low cost, it is unrealistic and of poor quality. The copying phenomenon of synthetic data prevents GEC models from learning the semantic knowledge of contextual language. In this paper, we design an instruction format and use the masking strategy in both an erroneous sentence and the corresponding instruction consistently to alleviate the impact of the copy phenomenon. We also propose a novel approach, InstructGEC, which integrates the knowledge of grammatical detection into GEC models with instruction tuning to address the low-quality issue. Experiments are conducted on English and Chinese GEC datasets and results demonstrate that our method outperforms state-of-the-art unsupervised GEC methods.
Anthology ID:
2025.coling-main.9
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
110–122
Language:
URL:
https://aclanthology.org/2025.coling-main.9/
DOI:
Bibkey:
Cite (ACL):
Jiayi Deng, Chen Chen, Chunyan Hou, and Xiaojie Yuan. 2025. InstructGEC: Enhancing Unsupervised Grammatical Error Correction with Instruction Tuning. In Proceedings of the 31st International Conference on Computational Linguistics, pages 110–122, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
InstructGEC: Enhancing Unsupervised Grammatical Error Correction with Instruction Tuning (Deng et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.9.pdf