Neural-based Tamil Grammar Error Detection

Dineskumar Murugesapillai, Anankan Ravinthirarasa, Gihan Dias, Kengatharaiyer Sarveswaran


Abstract
This paper describes an ongoing development of a grammar error checker for the Tamil language using a state-of-the-art deep neural-based approach. This proposed checker capture a vital type of grammar error called subject-predicate agreement errors. In this case, we specifically target the agreement error that occurs between nominal subject and verbal predicates. We also created the first-ever grammar error annotated corpus for Tamil. In addition, we experimented with different multi-lingual pre-trained language models to capture syntactic information and found that IndicBERT gives better performance for our tasks. We implemented this grammar checker as a multi-class classification on top of the IndicBERT pre-trained model, which we fine-tuned using our annotated data. This baseline model gives an F1 Score of 73.4. We are now in the process of improving this proposed system with the use of a dependency parser.
Anthology ID:
2021.pail-1.4
Volume:
Proceedings of the First Workshop on Parsing and its Applications for Indian Languages
Month:
December
Year:
2021
Address:
NIT Silchar, India
Venue:
PAIL
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
27–32
Language:
URL:
https://aclanthology.org/2021.pail-1.4
DOI:
Bibkey:
Cite (ACL):
Dineskumar Murugesapillai, Anankan Ravinthirarasa, Gihan Dias, and Kengatharaiyer Sarveswaran. 2021. Neural-based Tamil Grammar Error Detection. In Proceedings of the First Workshop on Parsing and its Applications for Indian Languages, pages 27–32, NIT Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):
Neural-based Tamil Grammar Error Detection (Murugesapillai et al., PAIL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.pail-1.4.pdf