Pei Cheng


2024

pdf bib
Enhancing Translation Ability of Large Language Models by Leveraging Task-Related Layers
Pei Cheng | Xiayang Shi | Yinlin Li
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Fine-tuning Large Language Models (LLMs) for machine translation is effective but costly. It also increases the risk of overfitting and catastrophic forgetting, especially when training data is limited. To tackle these challenges, we propose a novel method that involves adjusting task-related layers in large models to better harness their machine translation capabilities. This method aims to retain the model’s knowledge on other tasks while optimizing performance on translation tasks. By revealing the structure and characteristics of attention weights through singular value decomposition (SVD), we can make fine adjustments to specific layers, leveraging the model’s potential for more accurate and efficient translations. Our method not only addresses computational resource consumption and catastrophic forgetting but also offers a new perspective on utilizing the capabilities of large models effectively. Experimental validation shows that adjusting task-related layers significantly improves performance on translation tasks while maintaining stability and accuracy on other tasks. This finding provides valuable insights for fine-tuning and applying large models, advancing the field of machine translation.