2024
pdf
bib
abs
Neuron Specialization: Leveraging Intrinsic Task Modularity for Multilingual Machine Translation
Shaomu Tan
|
Di Wu
|
Christof Monz
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Training a unified multilingual model promotes knowledge transfer but inevitably introduces negative interference. Language-specific modeling methods show promise in reducing interference. However, they often rely on heuristics to distribute capacity and struggle to foster cross-lingual transfer via isolated modules. In this paper, we explore intrinsic task modularity within multilingual networks and leverage these observations to circumvent interference under multilingual translation. We show that neurons in the feed-forward layers tend to be activated in a language-specific manner. Meanwhile, these specialized neurons exhibit structural overlaps that reflect language proximity, which progress across layers. Based on these findings, we propose Neuron Specialization, an approach that identifies specialized neurons to modularize feed-forward layers and then continuously updates them through sparse networks. Extensive experiments show that our approach achieves consistent performance gains over strong baselines with additional analyses demonstrating reduced interference and increased knowledge transfer.
pdf
bib
abs
How Far can 100 Samples Go? Unlocking Zero-Shot Translation with Tiny Multi-Parallel Data
Di Wu
|
Shaomu Tan
|
Yan Meng
|
David Stap
|
Christof Monz
Findings of the Association for Computational Linguistics: ACL 2024
Zero-shot translation aims to translate between language pairs not seen during training in Multilingual Machine Translation (MMT) and is widely considered an open problem. A common, albeit resource-consuming, solution is to add as many related translation directions as possible to the training corpus. In this paper, we show that for an English-centric model, surprisingly large zero-shot improvements can be achieved by simply fine-tuning with a very small amount of multi-parallel data. For example, on the EC30 dataset, we obtain up to +21.7 ChrF++ non-English overall improvements (870 directions) by using only 100 multi-parallel samples while preserving English-centric translation quality. This performance exceeds M2M100 by an average of 5.9 ChrF++ in the involved non-English directions. When investigating the size effect of fine-tuning data on translation quality, we found that already a small, randomly sampled set of fine-tuning directions is sufficient to achieve comparable improvements. The resulting non-English performance is close to the complete translation upper bound. Even in a minimal setting—fine-tuning with only one single sample—the well-known off-target issue is almost completely resolved, explaining parts—but not all—of the observed improvements in translation quality.
pdf
bib
abs
UvA-MT’s Participation in the WMT24 General Translation Shared Task
Shaomu Tan
|
David Stap
|
Seth Aycock
|
Christof Monz
|
Di Wu
Proceedings of the Ninth Conference on Machine Translation
Fine-tuning Large Language Models (FT-LLMs) with parallel data has emerged as a promising paradigm in recent machine translation research. In this paper, we explore the effectiveness of FT-LLMs and compare them to traditional encoder-decoder Neural Machine Translation (NMT) systems under the WMT24 general MT shared task for English to Chinese direction. We implement several techniques, including Quality Estimation (QE) data filtering, supervised fine-tuning, and post-editing that integrate NMT systems with LLMs. We demonstrate that fine-tuning LLaMA2 on a high-quality but relatively small bitext dataset (100K) yields COMET results comparable to much smaller encoder-decoder NMT systems trained on over 22 million bitexts. However, this approach largely underperforms on surface-level metrics like BLEU and ChrF. We further control the data quality using the COMET-based quality estimation method. Our experiments show that 1) filtering low COMET scores largely improves encoder-decoder systems, but 2) no clear gains are observed for LLMs when further refining the fine-tuning set. Finally, we show that combining NMT systems with LLMs via post-editing generally yields the best performance for the WMT24 official test set.
2023
pdf
bib
abs
Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance
Shaomu Tan
|
Christof Monz
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Multilingual Neural Machine Translation (MNMT) facilitates knowledge sharing but often suffers from poor zero-shot (ZS) translation qualities. While prior work has explored the causes of overall low zero-shot translation qualities, our work introduces a fresh perspective: the presence of significant variations in zero-shot performance. This suggests that MNMT does not uniformly exhibit poor zero-shot capability; instead, certain translation directions yield reasonable results. Through systematic experimentation, spanning 1,560 language directions across 40 languages, we identify three key factors contributing to high variations in ZS NMT performance: 1) target-side translation quality, 2) vocabulary overlap, and 3) linguistic properties. Our findings highlight that the target side translation quality is the most influential factor, with vocabulary overlap consistently impacting zero-shot capabilities. Additionally, linguistic properties, such as language family and writing system, play a role, particularly with smaller models. Furthermore, we suggest that the off-target issue is a symptom of inadequate performance, emphasizing that zero-shot translation challenges extend beyond addressing the off-target problem. To support future research, we release the data and models as a benchmark for the study of ZS NMT.
pdf
bib
abs
UvA-MT’s Participation in the WMT 2023 General Translation Shared Task
Di Wu
|
Shaomu Tan
|
David Stap
|
Ali Araabi
|
Christof Monz
Proceedings of the Eighth Conference on Machine Translation
This paper describes the UvA-MT’s submission to the WMT 2023 shared task on general machine translation. We participate in the constrained track in two directions: English ↔ Hebrew. In this competition, we show that by using one model to handle bidirectional tasks, as a minimal setting of Multilingual Machine Translation (MMT), it is possible to achieve comparable results with that of traditional bilingual translation for both directions. By including effective strategies, like back-translation, re-parameterized embedding table, and task-oriented fine-tuning, we obtained competitive final results in the automatic evaluation for both English → Hebrew and Hebrew → English directions.