Beyond Grammatical Error Correction: Improving L1-influenced research writing in English using pre-trained encoder-decoder models

Gustavo Zomer, Ana Frankenberg-Garcia


Abstract
In this paper, we present a new method for training a writing improvement model adapted to the writer’s first language (L1) that goes beyond grammatical error correction (GEC). Without using annotated training data, we rely solely on pre-trained language models fine-tuned with parallel corpora of reference translation aligned with machine translation. We evaluate our model with corpora of academic papers written in English by L1 Portuguese and L1 Spanish scholars and a reference corpus of expert academic English. We show that our model is able to address specific L1-influenced writing and more complex linguistic phenomena than existing methods, outperforming what a state-of-the-art GEC system can achieve in this regard. Our code and data are open to other researchers.
Anthology ID:
2021.findings-emnlp.216
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2534–2540
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.216
DOI:
10.18653/v1/2021.findings-emnlp.216
Bibkey:
Cite (ACL):
Gustavo Zomer and Ana Frankenberg-Garcia. 2021. Beyond Grammatical Error Correction: Improving L1-influenced research writing in English using pre-trained encoder-decoder models. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2534–2540, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Beyond Grammatical Error Correction: Improving L1-influenced research writing in English using pre-trained encoder-decoder models (Zomer & Frankenberg-Garcia, Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.216.pdf
Software:
 2021.findings-emnlp.216.Software.zip
Video:
 https://aclanthology.org/2021.findings-emnlp.216.mp4
Code
 gzomer/beyondgec