Ruize Gao

2025

pdf bib abs
Enhancing Machine Translation with Self-Supervised Preference Data
Haoxiang Sun | Ruize Gao | Pei Zhang | Baosong Yang | Rui Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Model alignment methods like Direct Preference Optimization and Contrastive Preference Optimization have enhanced machine translation performance by leveraging preference data to enable models to reject suboptimal outputs. During preference data construction, previous approaches primarily rely on humans, strong models like GPT4 or model self-sampling. In this study, we first explain the shortcomings of this practice. Then, we propose Self-Supervised Preference Optimization (SSPO), a novel framework which efficiently constructs translation preference data for iterative DPO training. Applying SSPO to 14B parameters large language models (LLMs) achieves comparable or better performance than GPT-4o on FLORES and multi-domain test datasets. We release an augmented MQM dataset in https://github.com/sunny-sjtu/MQM-aug.

2024

We participate in the translation task on Spanish to Aragonese, Spanish to Aranese and Spanish to Asturian. Initially, we conduct preliminary experiments to assess the basic translation capabilities of various models and evaluate the impact of fine-tuning with different data types. We then choose to fine-tune the Qwen2-0.5B model using a forward synthesized pseudo-corpus from the Apertium translation system to replicate its fundamental performance. Building on this distillation model, we explore three optimization strategies across the three language directions: (1) Assembling the provided FLORES+ dev sets into a 5-shot format translation training dataset and performing few-shot fine-tuning to enhance model performance. (2) Utilizing the FLORES+ dev sets as training data and applying the Contrastive Preference Optimization (CPO) strategy for further refinement. (3) Retrieving the 20 most similar translation examples from the FLORES+ dev sets using the BM25 algorithm and performing 20-shot translations with the Claude 3.5-sonnet model. After evaluating these strategies, we select the best-performing approach for each language pair as our submission result.

This paper describes Shanghai Jiao Tong University (SJTU LoveFiction) Discourse-Level Literary Translation systems for the WMT24shared task. We participate in the literary translation task on Chinese → English, Chinese →German and Chinese → Russian with uncon-strained tack.Check our paper for detail.

2023

We present IMTLab, an open-source end-to-end interactive machine translation (IMT) system platform that enables researchers to quickly build IMT systems with state-of-the-art models, perform an end-to-end evaluation, and diagnose the weakness of systems. IMTLab treats the whole interactive translation process as a task-oriented dialogue with a human-in-the-loop setting, in which human interventions can be explicitly incorporated to produce high-quality, error-free translations. To this end, a general communication interface is designed to support the flexible IMT architectures and user policies. Based on the proposed design, we construct a simulated and real interactive environment to achieve end-to-end evaluation and leverage the framework to systematically evaluate previous IMT systems. Our simulated and manual experiments show that the prefix-constrained decoding approach still gains the lowest editing cost in the end-to-end evaluation, while BiTIIMT achieves comparable editing cost with a better interactive experience.

pdf bib abs
Nearest Neighbor Machine Translation is Meta-Optimizer on Output Projection Layer
Ruize Gao | Zhirui Zhang | Yichao Du | Lemao Liu | Rui Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Nearest Neighbor Machine Translation (kNN-MT) has achieved great success in domain adaptation tasks by integrating pre-trained Neural Machine Translation (NMT) models with domain-specific token-level retrieval. However, the reasons underlying its success have not been thoroughly investigated. In this paper, we comprehensively analyze kNN-MT through theoretical and empirical studies. Initially, we provide new insights into the working mechanism of kNN-MT as an efficient technique to implicitly execute gradient descent on the output projection layer of NMT, indicating that it is a specific case of model fine-tuning. Subsequently, we conduct multi-domain experiments and word-level analysis to examine the differences in performance between kNN-MT and entire-model fine-tuning. Our findings suggest that: (i) Incorporating kNN-MT with adapters yields comparable translation performance to fine-tuning on in-domain test sets, while achieving better performance on out-of-domain test sets; (ii) Fine-tuning significantly outperforms kNN-MT on the recall of in-domain low-frequency words, but this gap could be bridged by optimizing the context representations with additional adapter layers.

Co-authors

Shujian Huang (书剑黄) 1

Shuming Shi 1

Venues

Fix author