Beyond the Mode: Sequence-Level Distillation of Multilingual Translation Models for Low-Resource Language Pairs

Aarón Galiano-Jiménez; Juan Antonio Pérez-Ortiz; Felipe Sánchez-Martínez; Víctor M. Sánchez-Cartagena

doi:10.18653/v1/2025.findings-naacl.372

Beyond the Mode: Sequence-Level Distillation of Multilingual Translation Models for Low-Resource Language Pairs

Aarón Galiano-Jiménez, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena

Abstract

This paper delves into sequence-level knowledge distillation (KD) of multilingual pre-trained translation models. We posit that, beyond the approximated mode obtained via beam search, the whole output distribution of the teacher contains valuable insights for students. We explore the potential of n-best lists from beam search to guide student’s learning and then investigate alternative decoding methods to address observed issues like low variability and under-representation of infrequent tokens. Our research in data-limited scenarios reveals that although sampling methods can slightly compromise the translation quality of the teacher output compared to beam search based methods, they enrich the generated corpora with increased variability and lexical richness, ultimately enhancing student model performance and reducing the gender bias amplification commonly associated with KD.

Anthology ID:: 2025.findings-naacl.372
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6676–6691
Language:
URL:: https://aclanthology.org/2025.findings-naacl.372/
DOI:: 10.18653/v1/2025.findings-naacl.372
Bibkey:
Cite (ACL):: Aarón Galiano-Jiménez, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, and Víctor M. Sánchez-Cartagena. 2025. Beyond the Mode: Sequence-Level Distillation of Multilingual Translation Models for Low-Resource Language Pairs. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 6676–6691, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Beyond the Mode: Sequence-Level Distillation of Multilingual Translation Models for Low-Resource Language Pairs (Galiano-Jiménez et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.372.pdf

PDF Cite Search Fix data