Zhengshan Xue

2025

InImageTrans: Multimodal LLM-based Text Image Machine Translation
Fei Zuo | Kehai Chen | Yu Zhang | Zhengshan Xue | Min Zhang
Findings of the Association for Computational Linguistics: ACL 2025

Multimodal large language models (MLLMs) have shown remarkable capabilities across various downstream tasks. However, when MLLMs are transferred to the text image machine translation (TiMT) task, preliminary experiments reveal that MLLMs suffer from serious repetition and omission hallucinations. To alleviate these issues, this paper first designs an efficient MLLM named InImageTrans for TiMT and then proposes a simple and effective method named multi-conditional direct preference optimization (mcDPO) for advancing the TiMT. Particularly, the proposed mcDPO not only guides the MLLM in rejecting repetition output by creating text output preference pairs automatically, but also guides the MLLM in paying more attention to text information in images by creating image input preference pairs. Furthermore, we build a high-quality benchmark called MCiT for comprehensively evaluating the TiMT capabilities of InImageTrans. Experimental results show that the proposed method significantly outperforms existing open-source MLLMs on MCiT.

2024

pdf bib abs

It is widely known that hallucination is a critical issue in Simultaneous Machine Translation (SiMT) due to the absence of source-side information. While many efforts have been made to enhance performance for SiMT, few of them attempt to understand and analyze hallucination in SiMT.Therefore, we conduct a comprehensive analysis of hallucination in SiMT from two perspectives: understanding the distribution of hallucination words and the target-side context usage of them.Intensive experiments demonstrate some valuable findings and particularly show that it is possible to alleviate hallucination by decreasing the over usage of target-side information for SiMT.

2023

pdf bib abs

Distilling knowledge from a high-resource task, e.g., machine translation, is an effective way to alleviate the data scarcity problem of end-to-end speech translation. However, previous works simply use the classical knowledge distillation that does not allow for adequate transfer of knowledge from machine translation. In this paper, we propose a comprehensive knowledge distillation framework for speech translation, CKDST, which is capable of comprehensively and effectively distilling knowledge from machine translation to speech translation from two perspectives: cross-modal contrastive representation distillation and simultaneous decoupled knowledge distillation. In the former, we leverage a contrastive learning objective to optmize the mutual information between speech and text representations for representation distillation in the encoder. In the later, we decouple the non-target class knowledge from target class knowledge for logits distillation in the decoder. Experiments on the MuST-C benchmark dataset demonstrate that our CKDST substantially improves the baseline by 1.2 BLEU on average in all translation directions, and outperforms previous state-of-the-art end-to-end and cascaded speech translation models.

2022

pdf bib abs

Manifold’s English-Chinese System at WMT22 General MT Task
Chang Jin | Tingxun Shi | Zhengshan Xue | Xiaodong Lin
Proceedings of the Seventh Conference on Machine Translation (WMT)

Manifold’s English-Chinese System at WMT22 is an ensemble of 4 models trained by different configurations with scheduled sampling-based fine-tuning. The four configurations are DeepBig (XenC), DeepLarger (XenC), DeepBig-TalkingHeads (XenC) and DeepBig (LaBSE). Concretely, DeepBig extends Transformer-Big to 24 encoder layers. DeepLarger has 20 encoder layers and its feed-forward network (FFN) dimension is 8192. TalkingHeads applies the talking-heads trick. For XenC configs, we selected monolingual and parallel data that is similar to the past newstest datasets using XenC, and for LaBSE, we cleaned the officially provided parallel data using LaBSE pretrained model. According to the officially released autonomic metrics leaderboard, our final constrained system ranked 1st among all others when evaluated by bleu-all, chrf-all and COMET-B, 2nd by COMET-A.

2020

pdf bib abs

In this paper, we demonstrate our machine translation system applied for the Chinese-Japanese bidirectional translation task (aka. open domain translation task) for the IWSLT 2020. Our model is based on Transformer (Vaswani et al., 2017), with the help of many popular, widely proved effective data preprocessing and augmentation methods. Experiments show that these methods can improve the baseline model steadily and significantly.

2019

pdf bib abs

OPPO NMT System for IWSLT 2019
Xiaopu Li | Zhengshan Xue | Jie Hao
Proceedings of the 16th International Conference on Spoken Language Translation

This paper illustrates the OPPO's submission for IWSLT2019 text translation task Our system is based on Transformer architecture. Besides, we also study the effect of model ensembling. On the devsets of IWSLT 2019, the BLEU of our system reaches 19.94.

2017

pdf bib abs

Towards Neural Machine Translation with Partially Aligned Corpora
Yining Wang | Yang Zhao | Jiajun Zhang | Chengqing Zong | Zhengshan Xue
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While neural machine translation (NMT) has become the new paradigm, the parameter optimization requires large-scale parallel data which is scarce in many domains and language pairs. In this paper, we address a new translation scenario in which there only exists monolingual corpora and phrase pairs. We propose a new method towards translation with partially aligned sentence pairs which are derived from the phrase pairs and monolingual corpora. To make full use of the partially aligned corpora, we adapt the conventional NMT training method in two aspects. On one hand, different generation strategies are designed for aligned and unaligned target words. On the other hand, a different objective function is designed to model the partially aligned parts. The experiments demonstrate that our method can achieve a relatively good result in such a translation scenario, and tiny bitexts can boost translation quality to a large extent.