Mitigating Exposure Bias in Grammatical Error Correction with Data Augmentation and Reweighting

Hannan Cao, Wenmian Yang, Hwee Tou Ng


Abstract
The most popular approach in grammatical error correction (GEC) is based on sequence-to-sequence (seq2seq) models. Similar to other autoregressive generation tasks, seq2seq GEC also faces the exposure bias problem, i.e., the context tokens are drawn from different distributions during training and testing, caused by the teacher forcing mechanism. In this paper, we propose a novel data manipulation approach to overcome this problem, which includes a data augmentation method during training to mimic the decoder input at inference time, and a data reweighting method to automatically balance the importance of each kind of augmented samples. Experimental results on benchmark GEC datasets show that our method achieves significant improvements compared to prior approaches.
Anthology ID:
2023.eacl-main.155
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2123–2135
Language:
URL:
https://aclanthology.org/2023.eacl-main.155
DOI:
10.18653/v1/2023.eacl-main.155
Bibkey:
Cite (ACL):
Hannan Cao, Wenmian Yang, and Hwee Tou Ng. 2023. Mitigating Exposure Bias in Grammatical Error Correction with Data Augmentation and Reweighting. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2123–2135, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Mitigating Exposure Bias in Grammatical Error Correction with Data Augmentation and Reweighting (Cao et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.155.pdf
Video:
 https://aclanthology.org/2023.eacl-main.155.mp4