Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation

Jan Christian Blaise Cruz

Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation

Abstract

In this paper, we propose the use of simple knowledge distillation to produce smaller and more efficient single-language transformers from Massively Multilingual Transformers (MMTs) to alleviate tradeoffs associated with the use of such in low-resource settings. Using Tagalog as a case study, we show that these smaller single-language models perform on-par with strong baselines in a variety of benchmark tasks in a much more efficient manner. Furthermore, we investigate additional steps during the distillation process that improves the soft-supervision of the target language, and provide a number of analyses and ablations to show the efficacy of the proposed method.

Anthology ID:: 2025.loreslm-1.17
Volume:: Proceedings of the First Workshop on Language Models for Low-Resource Languages
Month:: January
Year:: 2025
Address:: Abu Dhabi, United Arab Emirates
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venues:: LoResLM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 219–224
Language:
URL:: https://aclanthology.org/2025.loreslm-1.17/
DOI:
Bibkey:
Cite (ACL):: Jan Christian Blaise Cruz. 2025. Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation. In Proceedings of the First Workshop on Language Models for Low-Resource Languages, pages 219–224, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation (Cruz, LoResLM 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.loreslm-1.17.pdf

PDF Cite Search Fix data