Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task

Gaëtan Caillaut; Mariam Nakhlé; Raheel Qader; Jingshu Liu; Jean-Gabriel Barthélemy

doi:10.18653/v1/2024.wmt-1.124

Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task

Gaëtan Caillaut, Mariam Nakhlé, Raheel Qader, Jingshu Liu, Jean-Gabriel Barthélemy

Abstract

Recent studies have showcased remarkable capabilities of decoder-only models in many NLP tasks, including translation. Yet, the machine translation field has been largely dominated by encoder-decoder models based on the Transformer architecture. As a consequence, scaling laws of encoder-decoder models for neural machine translation have already been well studied, but decoder-only models have received less attention.This work explores the scaling laws of decoder-only models on the multilingual and multidomain translation task. We trained a collection of six decoder-only models, ranging from 70M to 7B parameters, on a sentence-level, multilingual (8 languages) and multidomain (9 domains) dataset. We conducted a series of experiments showing that the loss of decoder-only models can be estimated using a scaling law similar to the one discovered for large language models, but we also show that this scaling law has difficulties to generalize to too large models or to a different data distribution. We also study different scaling methods and show that scaling the depth and the width of a model lead to similar test loss improvements, but with different impact on the model’s efficiency.

Anthology ID:: 2024.wmt-1.124
Volume:: Proceedings of the Ninth Conference on Machine Translation
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1318–1331
Language:
URL:: https://aclanthology.org/2024.wmt-1.124/
DOI:: 10.18653/v1/2024.wmt-1.124
Bibkey:
Cite (ACL):: Gaëtan Caillaut, Mariam Nakhlé, Raheel Qader, Jingshu Liu, and Jean-Gabriel Barthélemy. 2024. Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task. In Proceedings of the Ninth Conference on Machine Translation, pages 1318–1331, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task (Caillaut et al., WMT 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.wmt-1.124.pdf

PDF Cite Search Fix data