Junhao Ruan
2025
SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment
Yuchun Fan
|
Yongyu Mu
|
YiLin Wang
|
Lei Huang
|
Junhao Ruan
|
Bei Li
|
Tong Xiao
|
Shujian Huang
|
Xiaocheng Feng
|
Jingbo Zhu
Proceedings of the 31st International Conference on Computational Linguistics
Despite the significant improvements achieved by large language models (LLMs) in English reasoning tasks, these models continue to struggle with multilingual reasoning. Recent studies leverage a full-parameter and two-stage training paradigm to teach models to first understand non-English questions and then reason. However, this method suffers from both substantial computational resource computing and catastrophic forgetting. The fundamental cause is that, with the primary goal of enhancing multilingual comprehension, an excessive number of irrelevant layers and parameters are tuned during the first stage. Given our findings that the representation learning of languages is merely conducted in lower-level layers, we propose an efficient multilingual reasoning alignment approach that precisely identifies and fine-tunes the layers responsible for handling multilingualism. Experimental results show that our method, SLAM, only tunes 6 layers’ feed-forward sub-layers including 6.5-8% of all parameters within 7B and 13B LLMs, achieving superior average performance than all strong baselines across 10 languages. Meanwhile, SLAM only involves one training stage, reducing training time by 4.1-11.9× compared to the two-stage method.
2024
Exploiting Target Language Data for Neural Machine Translation Beyond Back Translation
Abudurexiti Reheman
|
Yingfeng Luo
|
Junhao Ruan
|
Chunliang Zhang
|
Anxiang Ma
|
Tong Xiao
|
JingBo Zhu
Findings of the Association for Computational Linguistics: ACL 2024
Neural Machine Translation (NMT) encounters challenges when translating in new domains and low-resource languages. To address these issues, researchers have proposed methods to integrate additional knowledge into NMT, such as translation memories (TMs). However, finding TMs that closely match the input sentence remains challenging, particularly in specific domains. On the other hand, monolingual data is widely accessible in most languages, and back-translation is seen as a promising approach for utilizing target language data. Nevertheless, it still necessitates additional training. In this paper, we introduce Pseudo-kNN-MT, a variant of k-nearest neighbor machine translation (kNN-MT) that utilizes target language data by constructing a pseudo datastore. Furthermore, we investigate the utility of large language models (LLMs) for the kNN component. Experimental results demonstrate that our approach exhibits strong domain adaptation capability in both high-resource and low-resource machine translation. Notably, LLMs are found to be beneficial for robust NMT systems.
Search
Fix data
Co-authors
- Tong Xiao (肖桐) 2
- Jingbo Zhu (朱靖波) 2
- Yuchun Fan 1
- Xiaocheng Feng 1
- Lei Huang 1
- show all...