Patrick Michael Pumaccahua Huallpa

2026

Optimization of Voice Translation Systems for Indigenous Languages: Retraining the NLLB-200 Model for the Quechua–Spanish Pair
Mitzuko Davis Quispe Callañaupa | Max Erixon Toledo Bernal | Ronil Nilo Torres Bautista | Patrick Michael Pumaccahua Huallpa
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)

This article describes the fine-tuning and incremental retraining process of the massive NLLB-200 model applied to the Quechua (Chanka and Collao variants) and Spanish language pair. Using a curated dataset of 22,891 parallel pairs, a robust cleaning strategy and optimized training for consumer hardware (NVIDIA RTX 3060) were implemented. The results demonstrate a progressive improvement in the BLEU metric, reaching a competitive state for translation tasks in low-resource scenarios, in line with the challenges posed by the IWSLT 2026 shared task.

Co-authors

Mitzuko Davis Quispe Callañaupa 1
Max Erixon Toledo Bernal 1
Ronil Nilo Torres Bautista 1

Venues

IWSLT1
WS1

Fix author