Paloma Fernandez Mira
Developing NLP Tools with a New Corpus of Learner Spanish
Sam Davidson | Aaron Yamada | Paloma Fernandez Mira | Agustina Carando | Claudia H. Sanchez Gutierrez | Kenji Sagae
Proceedings of the 12th Language Resources and Evaluation Conference
The development of effective NLP tools for the L2 classroom depends largely on the availability of large annotated corpora of language learner text. While annotated learner corpora of English are widely available, large learner corpora of Spanish are less common. Those Spanish corpora that are available do not contain the annotations needed to facilitate the development of tools beneficial to language learners, such as grammatical error correction. As a result, the field has seen little research in NLP tools designed to benefit Spanish language learners and teachers. We introduce COWS-L2H, a freely available corpus of Spanish learner data which includes error annotations and parallel corrected text to help researchers better understand L2 development, to examine teaching practices empirically, and to develop NLP tools to better serve the Spanish teaching community. We demonstrate the utility of this corpus by developing a neural-network based grammatical error correction system for Spanish learner writing.