Anankan Ravinthirarasa


pdf bib
Neural-based Tamil Grammar Error Detection
Dineskumar Murugesapillai | Anankan Ravinthirarasa | Gihan Dias | Kengatharaiyer Sarveswaran
Proceedings of the First Workshop on Parsing and its Applications for Indian Languages

This paper describes an ongoing development of a grammar error checker for the Tamil language using a state-of-the-art deep neural-based approach. This proposed checker capture a vital type of grammar error called subject-predicate agreement errors. In this case, we specifically target the agreement error that occurs between nominal subject and verbal predicates. We also created the first-ever grammar error annotated corpus for Tamil. In addition, we experimented with different multi-lingual pre-trained language models to capture syntactic information and found that IndicBERT gives better performance for our tasks. We implemented this grammar checker as a multi-class classification on top of the IndicBERT pre-trained model, which we fine-tuned using our annotated data. This baseline model gives an F1 Score of 73.4. We are now in the process of improving this proposed system with the use of a dependency parser.