2023
pdf
bib
abs
The Xiaomi AI Lab’s Speech Translation Systems for IWSLT 2023 Offline Task, Simultaneous Task and Speech-to-Speech Task
Wuwei Huang
|
Mengge Liu
|
Xiang Li
|
Yanzhi Tian
|
Fengyu Yang
|
Wen Zhang
|
Jian Luan
|
Bin Wang
|
Yuhang Guo
|
Jinsong Su
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
This system description paper introduces the systems submitted by Xiaomi AI Lab to the three tracks of the IWSLT 2023 Evaluation Campaign, namely the offline speech translation (Offline-ST) track, the offline speech-to-speech translation (Offline-S2ST) track, and the simultaneous speech translation (Simul-ST) track. All our submissions for these three tracks only involve the English-Chinese language direction. Our English-Chinese speech translation systems are constructed using large-scale pre-trained models as the foundation. Specifically, we fine-tune these models’ corresponding components for various downstream speech translation tasks. Moreover, we implement several popular techniques, such as data filtering, data augmentation, speech segmentation, and model ensemble, to improve the system’s overall performance. Extensive experiments show that our systems achieve a significant improvement over the strong baseline systems in terms of the automatic evaluation metric.
pdf
bib
abs
In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model
Yanzhi Tian
|
Xiang Li
|
Zeming Liu
|
Yuhang Guo
|
Bin Wang
Findings of the Association for Computational Linguistics: EMNLP 2023
In-Image Machine Translation (IIMT) aims to convert images containing texts from one language to another. Traditional approaches for this task are cascade methods, which utilize optical character recognition (OCR) followed by neural machine translation (NMT) and text rendering. However, the cascade methods suffer from compounding errors of OCR and NMT, leading to a decrease in translation quality. In this paper, we propose an end-to-end model instead of the OCR, NMT and text rendering pipeline. Our neural architecture adopts encoder-decoder paradigm with segmented pixel sequences as inputs and outputs. Through end-to-end training, our model yields improvements across various dimensions, (i) it achieves higher translation quality by avoiding error propagation, (ii) it demonstrates robustness for out domain data, and (iii) it displays insensitivity to incomplete words. To validate the effectiveness of our method and support for future research, we construct our dataset containing 4M pairs of De-En images and train our end-to-end model. The experimental results show that our approach outperforms both cascade method and current end-to-end model.
pdf
bib
abs
BIT-ACT: An Ancient Chinese Translation System Using Data Augmentation
Li Zeng
|
Yanzhi Tian
|
Yingyu Shan
|
Yuhang Guo
Proceedings of ALT2023: Ancient Language Translation Workshop
This paper describes a translation model for ancient Chinese to modern Chinese and English for the Evahan 2023 competition, a subtask of the Ancient Language Translation 2023 challenge. During the training of our model, we applied various data augmentation techniques and used SiKu-RoBERTa as part of our model architecture. The results indicate that back translation improves the model’s performance, but double back translation introduces noise and harms the model’s performance. Fine-tuning on the original dataset can be helpful in solving the issue.
2022
pdf
bib
abs
BIT-Xiaomi’s System for AutoSimTrans 2022
Mengge Liu
|
Xiang Li
|
Bao Chen
|
Yanzhi Tian
|
Tianwei Lan
|
Silin Li
|
Yuhang Guo
|
Jian Luan
|
Bin Wang
Proceedings of the Third Workshop on Automatic Simultaneous Translation
This system paper describes the BIT-Xiaomi simultaneous translation system for Autosimtrans 2022 simultaneous translation challenge. We participated in three tracks: the Zh-En text-to-text track, the Zh-En audio-to-text track and the En-Es test-to-text track. In our system, wait-k is employed to train prefix-to-prefix translation models. We integrate streaming chunking to detect boundaries as the source streaming read in. We further improve our system with data selection, data-augmentation and R-drop training methods. Results show that our wait-k implementation outperforms organizer’s baseline by 8 BLEU score at most, and our proposed streaming chunking method further improves about 2 BLEU in low latency regime.
pdf
bib
abs
Ancient Chinese Word Segmentation and Part-of-Speech Tagging Using Data Augmentation
Yanzhi Tian
|
Yuhang Guo
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
We attended the EvaHan2022 ancient Chinese word segmentation and Part-of-Speech (POS) tagging evaluation. We regard the Chinese word segmentation and POS tagging as sequence tagging tasks. Our system is based on a BERT-BiLSTM-CRF model which is trained on the data provided by the EvaHan2022 evaluation. Besides, we also employ data augmentation techniques to enhance the performance of our model. On the Test A and Test B of the evaluation, the F1 scores of our system achieve 94.73% and 90.93% for the word segmentation, 89.19% and 83.48% for the POS tagging.