Optimization of Voice Translation Systems for Indigenous Languages: Retraining the NLLB-200 Model for the Quechua–Spanish Pair

Mitzuko Davis Quispe Callañaupa, Max Erixon Toledo Bernal, Ronil Nilo Torres Bautista, Patrick Michael Pumaccahua Huallpa


Abstract
This article describes the fine-tuning and incremental retraining process of the massive NLLB-200 model applied to the Quechua (Chanka and Collao variants) and Spanish language pair. Using a curated dataset of 22,891 parallel pairs, a robust cleaning strategy and optimized training for consumer hardware (NVIDIA RTX 3060) were implemented. The results demonstrate a progressive improvement in the BLEU metric, reaching a competitive state for translation tasks in low-resource scenarios, in line with the challenges posed by the IWSLT 2026 shared task.
Anthology ID:
2026.iwslt-1.35
Volume:
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Month:
July
Year:
2026
Address:
San Diego, USA (in-person and online)
Editors:
Elizabeth Salesky, Antonios Anastasopoulos, Matteo Negri, Marcello Federico
Venues:
IWSLT | WS
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
316–317
Language:
URL:
https://aclanthology.org/2026.iwslt-1.35/
DOI:
Bibkey:
Cite (ACL):
Mitzuko Davis Quispe Callañaupa, Max Erixon Toledo Bernal, Ronil Nilo Torres Bautista, and Patrick Michael Pumaccahua Huallpa. 2026. Optimization of Voice Translation Systems for Indigenous Languages: Retraining the NLLB-200 Model for the Quechua–Spanish Pair. In Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 316–317, San Diego, USA (in-person and online). Association for Computational Linguistics.
Cite (Informal):
Optimization of Voice Translation Systems for Indigenous Languages: Retraining the NLLB-200 Model for the Quechua–Spanish Pair (Quispe Callañaupa et al., IWSLT 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwslt-1.35.pdf