Preliminary Results on the Evaluation of Computational Tools for the Analysis of Quechua and Aymara

Marcelo Yuji Himoro, Antonio Pareja-Lora


Abstract
This research has focused on evaluating the existing open-source morphological analyzers for two of the most widely spoken indigenous macrolanguages in South America, namely Quechua and Aymara. Firstly, we have evaluated their performance (precision, recall and F1 score) for the individual languages for which they were developed (Cuzco Quechua and Aymara). Secondly, in order to assess how these tools handle other individual languages of the macrolanguage, we have extracted some sample text from school textbooks and educational resources. This sample text was edited in the different countries where these macrolanguages are spoken (Colombia, Ecuador, Peru, Bolivia, Chile and Argentina for Quechua; and Bolivia, Peru and Chile for Aymara), and it includes their different standardized forms (10 individual languages of Quechua and 3 of Aymara). Processing this text by means of the tools, we have (i) calculated their coverage (number of words recognized and analyzed) and (ii) studied in detail the cases for which each tool was unable to generate any output. Finally, we discuss different ways in which these tools could be optimized, either to improve their performances or, in the specific case of Quechua, to cover more individual languages of this macrolanguage in future works as well.
Anthology ID:
2022.lrec-1.584
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5450–5459
Language:
URL:
https://aclanthology.org/2022.lrec-1.584
DOI:
Bibkey:
Cite (ACL):
Marcelo Yuji Himoro and Antonio Pareja-Lora. 2022. Preliminary Results on the Evaluation of Computational Tools for the Analysis of Quechua and Aymara. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5450–5459, Marseille, France. European Language Resources Association.
Cite (Informal):
Preliminary Results on the Evaluation of Computational Tools for the Analysis of Quechua and Aymara (Himoro & Pareja-Lora, LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.584.pdf