Multilingual Lottery Tickets to Pretrain Language Models

Jaeseong Lee, Seung-won Hwang


Abstract
The curse of multilinguality in training multilingual pretrained language models (mPLMs) refers to the negative interference between languages, especially when the capacity is limited. While increasing the capacity may appear intuitive for overcoming this curse, it negatively affects both training and inference costs. Our distinction is pursuing the competing goals of reducing negative interference, while keeping capacity per each language more or less the same. Specifically, we first scale the model to reduce interference, then search for a per-language subnetwork, or a lottery ticket, with comparable performance to the full model. According to lottery ticket hypothesis, this scale-then-find-ticket approach alleviates interfering signals as in the scaled model, but redistributes parameters to keep the parameters reduced. Finally, to avoid the cost of multiple retraining for searching multilingual tickets, we explore zero-shot neural architecture search (NAS) methods. We investigate the most appropriate zero-shot NAS method to find multilingual tickets. Our proposed multilingual tickets reduce the inference cost of models for each languages, while boosting the performances. The ticket search cost is negligible and tickets found qualitatively preserve linguistic similarity. Our code is publicly available.
Anthology ID:
2023.findings-emnlp.629
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9387–9398
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.629
DOI:
10.18653/v1/2023.findings-emnlp.629
Bibkey:
Cite (ACL):
Jaeseong Lee and Seung-won Hwang. 2023. Multilingual Lottery Tickets to Pretrain Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9387–9398, Singapore. Association for Computational Linguistics.
Cite (Informal):
Multilingual Lottery Tickets to Pretrain Language Models (Lee & Hwang, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.629.pdf