Wang Weilei
2024
E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness
Chen Linqing
|
Wang Weilei
|
Hu Dongyang
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“In the field of Natural Language Processing (NLP), Large-scale Language Models (LLMs) havedemonstrated exceptional capabilities across a variety of tasks, including question answering,classification, and particularly, natural language understanding. The integration of neural ma-chine translation with LLMs presents significant potential, transforming the paradigms of cross-lingual communication and information exchange. This study investigates the foundational as-pects of LLMs’ translation abilities and identifies effective training methodologies to equip themwith multilingual capacities. We specifically explore the optimal timing for introducing trans-lation capabilities to LLMs via supervised tasks, considering the inherent bilingual nature ofmachine translation. Key questions explored include whether it is more beneficial to integratemultiple languages during the pre-training or supervised fine-tuning (SFT) stages, how varia-tions in language ratios influence LLMs’ translation abilities, and whether longer or shorter textsare more effective for training these models. This research conducts a thorough investigationby training multiple LLMs from scratch with parameter scales in the billions and enhances therobustness of our findings by upgrading the language capabilities of pre-trained open-sourcemodels with parameter scales reaching tens of billions. The aim is to provide a detailed analysisthat elucidates the complexities of augmenting machine translation capabilities within LLMs.”
2023
Dynamic-FACT: A Dynamic Framework for Adaptive Context-Aware Translation
Chen Linqing
|
Wang Weilei
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
“Document-level neural machine translation (NMT) has garnered considerable attention sincethe emergence of various context-aware NMT models. However, these static NMT models aretrained on fixed parallel datasets, thus lacking awareness of the target document during infer-ence. In order to alleviate this limitation, we propose a dynamic adapter-translator frameworkfor context-aware NMT, which adapts the trained NMT model to the input document prior totranslation. Specifically, the document adapter reconstructs the scrambled portion of the originaldocument from a deliberately corrupted version, thereby reducing the performance disparity be-tween training and inference. To achieve this, we employ an adaptation process in both the train-ing and inference stages. Our experimental results on document-level translation benchmarksdemonstrate significant enhancements in translation performance, underscoring the necessity ofdynamic adaptation for context-aware translation and the efficacy of our methodologies. Introduction”