UCSP Submission to the AmericasNLP 2025 Shared Task

Jorge Asillo Congora; Julio Santisteban; Ricardo Lazo Vasquez

doi:10.18653/v1/2025.americasnlp-1.9

UCSP Submission to the AmericasNLP 2025 Shared Task

Jorge Asillo Congora, Julio Santisteban, Ricardo Lazo Vasquez

Abstract

Quechua is a low-resource language spoken by more than 7 million people in South America. While Quechua is primarily an oral language, several orthographic standards do exist. There is no universally adopted writing standard for Quechua, and variations exist across dialects and regions; its current writing is based on how it is uttered and how the sound is written. Quechua is a family of languages with similarities among the seven variants. The lack of a parallel dataset has reduced the opportunities for developing machine translation. We investigated whether increasing the current Quechua Parallel dataset with synthetic sentences and using a pre-trained large language model improves the performance of a Quechua machine translation. A Large language model has been used to generate synthetic sentences to extend the current parallel dataset. We use the mt5 model to fine-tune it to develop a machine translation for Quechua to Spanish and vice versa. Our survey identified the gaps in the state of the art of Quechua machine translation, and our BLEU/Chrf++ results show an improvement over the state of the art.

Anthology ID:: 2025.americasnlp-1.9
Volume:: Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Manuel Mager, Abteen Ebrahimi, Robert Pugh, Shruti Rijhwani, Katharina Von Der Wense, Luis Chiruzzo, Rolando Coto-Solano, Arturo Oncevay
Venues:: AmericasNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 84–91
Language:
URL:: https://aclanthology.org/2025.americasnlp-1.9/
DOI:: 10.18653/v1/2025.americasnlp-1.9
Bibkey:
Cite (ACL):: Jorge Asillo Congora, Julio Santisteban, and Ricardo Lazo Vasquez. 2025. UCSP Submission to the AmericasNLP 2025 Shared Task. In Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP), pages 84–91, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: UCSP Submission to the AmericasNLP 2025 Shared Task (Asillo Congora et al., AmericasNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.americasnlp-1.9.pdf

PDF Cite Search Fix data