Apurva Shrikant Nagvenkar
2025
Leveraging Multilingual Models for Robust Grammatical Error Correction Across Low-Resource Languages
Divesh Ramesh Kubal
|
Apurva Shrikant Nagvenkar
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Grammatical Error Correction (GEC) is a crucial task in Natural Language Processing (NLP) aimed at improving the quality of user-generated content, particularly for non-native speakers. This paper introduces a novel end-to-end architecture utilizing the M2M100 multilingual transformer model to build a unified GEC system, with a focus on low-resource languages. A synthetic data generation pipeline is proposed, tailored to address language-specific error categories. The system has been implemented for the Spanish language, showing promising results based on evaluations conducted by linguists with expertise in Spanish. Additionally, we present a user analysis that tracks user interactions, revealing an acceptance rate of 88.2%, as reflected by the actions performed by users.