Leveraging Multilingual Models for Robust Grammatical Error Correction Across Low-Resource Languages

Divesh Ramesh Kubal, Apurva Shrikant Nagvenkar


Abstract
Grammatical Error Correction (GEC) is a crucial task in Natural Language Processing (NLP) aimed at improving the quality of user-generated content, particularly for non-native speakers. This paper introduces a novel end-to-end architecture utilizing the M2M100 multilingual transformer model to build a unified GEC system, with a focus on low-resource languages. A synthetic data generation pipeline is proposed, tailored to address language-specific error categories. The system has been implemented for the Spanish language, showing promising results based on evaluations conducted by linguists with expertise in Spanish. Additionally, we present a user analysis that tracks user interactions, revealing an acceptance rate of 88.2%, as reflected by the actions performed by users.
Anthology ID:
2025.coling-industry.43
Volume:
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
505–510
Language:
URL:
https://aclanthology.org/2025.coling-industry.43/
DOI:
Bibkey:
Cite (ACL):
Divesh Ramesh Kubal and Apurva Shrikant Nagvenkar. 2025. Leveraging Multilingual Models for Robust Grammatical Error Correction Across Low-Resource Languages. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 505–510, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Leveraging Multilingual Models for Robust Grammatical Error Correction Across Low-Resource Languages (Kubal & Nagvenkar, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-industry.43.pdf