E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness

Chen Linqing, Wang Weilei, Hu Dongyang


Abstract
“In the field of Natural Language Processing (NLP), Large-scale Language Models (LLMs) havedemonstrated exceptional capabilities across a variety of tasks, including question answering,classification, and particularly, natural language understanding. The integration of neural ma-chine translation with LLMs presents significant potential, transforming the paradigms of cross-lingual communication and information exchange. This study investigates the foundational as-pects of LLMs’ translation abilities and identifies effective training methodologies to equip themwith multilingual capacities. We specifically explore the optimal timing for introducing trans-lation capabilities to LLMs via supervised tasks, considering the inherent bilingual nature ofmachine translation. Key questions explored include whether it is more beneficial to integratemultiple languages during the pre-training or supervised fine-tuning (SFT) stages, how varia-tions in language ratios influence LLMs’ translation abilities, and whether longer or shorter textsare more effective for training these models. This research conducts a thorough investigationby training multiple LLMs from scratch with parameter scales in the billions and enhances therobustness of our findings by upgrading the language capabilities of pre-trained open-sourcemodels with parameter scales reaching tens of billions. The aim is to provide a detailed analysisthat elucidates the complexities of augmenting machine translation capabilities within LLMs.”
Anthology ID:
2024.ccl-1.79
Volume:
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:
July
Year:
2024
Address:
Taiyuan, China
Editors:
Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
1023–1034
Language:
English
URL:
https://aclanthology.org/2024.ccl-1.79/
DOI:
Bibkey:
Cite (ACL):
Chen Linqing, Wang Weilei, and Hu Dongyang. 2024. E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness. In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 1023–1034, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):
E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness (Linqing et al., CCL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ccl-1.79.pdf