Enhancing Translation Ability of Large Language Models by Leveraging Task-Related Layers

Pei Cheng; Xiayang Shi; Yinlin Li

Enhancing Translation Ability of Large Language Models by Leveraging Task-Related Layers

Abstract

Fine-tuning Large Language Models (LLMs) for machine translation is effective but costly. It also increases the risk of overfitting and catastrophic forgetting, especially when training data is limited. To tackle these challenges, we propose a novel method that involves adjusting task-related layers in large models to better harness their machine translation capabilities. This method aims to retain the model’s knowledge on other tasks while optimizing performance on translation tasks. By revealing the structure and characteristics of attention weights through singular value decomposition (SVD), we can make fine adjustments to specific layers, leveraging the model’s potential for more accurate and efficient translations. Our method not only addresses computational resource consumption and catastrophic forgetting but also offers a new perspective on utilizing the capabilities of large models effectively. Experimental validation shows that adjusting task-related layers significantly improves performance on translation tasks while maintaining stability and accuracy on other tasks. This finding provides valuable insights for fine-tuning and applying large models, advancing the field of machine translation.

Anthology ID:: 2024.lrec-main.540
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 6110–6121
Language:
URL:: https://aclanthology.org/2024.lrec-main.540
DOI:
Bibkey:
Cite (ACL):: Pei Cheng, Xiayang Shi, and Yinlin Li. 2024. Enhancing Translation Ability of Large Language Models by Leveraging Task-Related Layers. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6110–6121, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Enhancing Translation Ability of Large Language Models by Leveraging Task-Related Layers (Cheng et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.540.pdf

PDF Cite Search