E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness

Chen Linqing; Wang Weilei; Hu Dongyang

E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness

Abstract

“In the field of Natural Language Processing (NLP), Large-scale Language Models (LLMs) havedemonstrated exceptional capabilities across a variety of tasks, including question answering,classification, and particularly, natural language understanding. The integration of neural ma-chine translation with LLMs presents significant potential, transforming the paradigms of cross-lingual communication and information exchange. This study investigates the foundational as-pects of LLMs’ translation abilities and identifies effective training methodologies to equip themwith multilingual capacities. We specifically explore the optimal timing for introducing trans-lation capabilities to LLMs via supervised tasks, considering the inherent bilingual nature ofmachine translation. Key questions explored include whether it is more beneficial to integratemultiple languages during the pre-training or supervised fine-tuning (SFT) stages, how varia-tions in language ratios influence LLMs’ translation abilities, and whether longer or shorter textsare more effective for training these models. This research conducts a thorough investigationby training multiple LLMs from scratch with parameter scales in the billions and enhances therobustness of our findings by upgrading the language capabilities of pre-trained open-sourcemodels with parameter scales reaching tens of billions. The aim is to provide a detailed analysisthat elucidates the complexities of augmenting machine translation capabilities within LLMs.”

Anthology ID:: 2024.ccl-1.79
Volume:: Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:: July
Year:: 2024
Address:: Taiyuan, China
Editors:: Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 1023–1034
Language:: English
URL:: https://aclanthology.org/2024.ccl-1.79/
DOI:
Bibkey:
Cite (ACL):: Chen Linqing, Wang Weilei, and Hu Dongyang. 2024. E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness. In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 1023–1034, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):: E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness (Linqing et al., CCL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.ccl-1.79.pdf

PDF Cite Search Fix data