Recycling and Comparing Morphological Annotation Models for Armenian Diachronic-Variational Corpus Processing

Chahan Vidal-Gorène, Victoria Khurshudyan, Anaïd Donabédian-Demopoulos


Abstract
Armenian is a language with significant variation and unevenly distributed NLP resources for different varieties. An attempt is made to process an RNN model for morphological annotation on the basis of different Armenian data (provided or not with morphologically annotated corpora), and to compare the annotation results of RNN and rule-based models. Different tests were carried out to evaluate the reuse of an unspecialized model of lemmatization and POS-tagging for under-resourced language varieties. The research focused on three dialects and further extended to Western Armenian with a mean accuracy of 94,00 % in lemmatization and 97,02% in POS-tagging, as well as a possible reusability of models to cover different other Armenian varieties. Interestingly, the comparison of an RNN model trained on Eastern Armenian with the Eastern Armenian National Corpus rule-based model applied to Western Armenian showed an enhancement of 19% in parsing. This model covers 88,79% of a short heterogeneous dataset in Western Armenian, and could be a baseline for a massive corpus annotation in that standard. It is argued that an RNN-based model can be a valid alternative to a rule-based one giving consideration to such factors as time-consumption, reusability for different varieties of a target language and significant qualitative results in morphological annotation.
Anthology ID:
2020.vardial-1.9
Volume:
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer
Venue:
VarDial
SIG:
Publisher:
International Committee on Computational Linguistics (ICCL)
Note:
Pages:
90–101
Language:
URL:
https://aclanthology.org/2020.vardial-1.9
DOI:
Bibkey:
Cite (ACL):
Chahan Vidal-Gorène, Victoria Khurshudyan, and Anaïd Donabédian-Demopoulos. 2020. Recycling and Comparing Morphological Annotation Models for Armenian Diachronic-Variational Corpus Processing. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 90–101, Barcelona, Spain (Online). International Committee on Computational Linguistics (ICCL).
Cite (Informal):
Recycling and Comparing Morphological Annotation Models for Armenian Diachronic-Variational Corpus Processing (Vidal-Gorène et al., VarDial 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.vardial-1.9.pdf