When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale

Christos Baziotis, Biao Zhang, Alexandra Birch, Barry Haddow


Abstract
Multilingual machine translation (MMT), trained on a mixture of parallel and monolingual data, is key for improving translation in low-resource language pairs. However, the literature offers conflicting results on the performance of different methods of including monolingual data. To resolve this, we examine how denoising autoencoding (DAE) and backtranslation (BT) impact MMT under different data conditions and model scales. Unlike prior studies, we use a realistic dataset of 100 translation directions and consider many domain combinations of monolingual and test data. We find that monolingual data generally helps MMT, but models are surprisingly brittle to domain mismatches, especially at smaller model scales. BT is beneficial when the parallel, monolingual, and test data sources are similar but can be detrimental otherwise, while DAE is less effective than previously reported. Next, we analyze the impact of scale (from 90M to 1.6B parameters) and find it is important for both methods, particularly DAE. As scale increases, DAE transitions from underperforming the parallel-only baseline at 90M to converging with BT performance at 1.6B, and even surpassing it in low-resource. These results offer new insights into how to best use monolingual data in MMT.
Anthology ID:
2024.naacl-long.349
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6297–6324
Language:
URL:
https://aclanthology.org/2024.naacl-long.349
DOI:
Bibkey:
Cite (ACL):
Christos Baziotis, Biao Zhang, Alexandra Birch, and Barry Haddow. 2024. When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6297–6324, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale (Baziotis et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.349.pdf
Copyright:
 2024.naacl-long.349.copyright.pdf