Chiara Argese
2021
Rules Ruling Neural Networks - Neural vs. Rule-Based Grammar Checking for a Low Resource Language
Linda Wiechetek
|
Flammie Pirinen
|
Mika Hämäläinen
|
Chiara Argese
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
We investigate both rule-based and machine learning methods for the task of compound error correction and evaluate their efficiency for North Sámi, a low resource language. The lack of error-free data needed for a neural approach is a challenge to the development of these tools, which is not shared by bigger languages. In order to compensate for that, we used a rule-based grammar checker to remove erroneous sentences and insert compound errors by splitting correct compounds. We describe how we set up the error detection rules, and how we train a bi-RNN based neural network. The precision of the rule-based model tested on a corpus with real errors (81.0%) is slightly better than the neural model (79.4%). The rule-based model is also more flexible with regard to fixing specific errors requested by the user community. However, the neural model has a better recall (98%). The results suggest that an approach that combines the advantages of both models would be desirable in the future. Our tools and data sets are open-source and freely available on GitHub and Zenodo.
2018
Using authentic texts for grammar exercises for a minority language
Lene Antonsen
|
Chiara Argese
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning