Proceedings of the Second Workshop on Automatic Simultaneous Translation

Hua Wu, Colin Cherry, Liang Huang, Zhongjun He, Qun Liu, Maha Elbayad, Mark Liberman, Haifeng Wang, Mingbo Ma, Ruiqing Zhang (Editors)


Anthology ID:
2021.autosimtrans-1
Month:
June
Year:
2021
Address:
Online
Venues:
AutoSimTrans | NAACL
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2021.autosimtrans-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/2021.autosimtrans-1.pdf

pdf bib
Proceedings of the Second Workshop on Automatic Simultaneous Translation
Hua Wu | Colin Cherry | Liang Huang | Zhongjun He | Qun Liu | Maha Elbayad | Mark Liberman | Haifeng Wang | Mingbo Ma | Ruiqing Zhang

pdf bib
ICT’s System for AutoSimTrans 2021: Robust Char-Level Simultaneous Translation
Shaolei Zhang | Yang Feng

Simultaneous translation (ST) outputs the translation simultaneously while reading the input sentence, which is an important component of simultaneous interpretation. In this paper, we describe our submitted ST system, which won the first place in the streaming transcription input track of the Chinese-English translation task of AutoSimTrans 2021. Aiming at the robustness of ST, we first propose char-level simultaneous translation and applied wait-k policy on it. Meanwhile, we apply two data processing methods and combine two training methods for domain adaptation. Our method enhance the ST model with stronger robustness and domain adaptability. Experiments on streaming transcription show that our method outperforms the baseline at all latency, especially at low latency, the proposed method improves about 6 BLEU. Besides, ablation studies we conduct verify the effectiveness of each module in the proposed method.

pdf bib
BIT’s system for AutoSimulTrans2021
Mengge Liu | Shuoying Chen | Minqin Li | Zhipeng Wang | Yuhang Guo

In this paper we introduce our Chinese-English simultaneous translation system participating in AutoSimulTrans2021. In simultaneous translation, translation quality and delay are both important. In order to reduce the translation delay, we cut the streaming-input source sentence into segments and translate the segments before the full sentence is received. In order to obtain high-quality translations, we pre-train a translation model with adequate corpus and fine-tune the model with domain adaptation and sentence length adaptation. The experimental results on the evaluation data show that our system performs better than the baseline system.

pdf bib
XMU’s Simultaneous Translation System at NAACL 2021
Shuangtao Li | Jinming Hu | Boli Wang | Xiaodong Shi | Yidong Chen

This paper describes our two systems submitted to the simultaneous translation evaluation at the 2nd automatic simultaneous translation workshop.

pdf bib
System Description on Automatic Simultaneous Translation Workshop
Linjie Chen | Jianzong Wang | Zhangcheng Huang | Xiongbin Ding | Jing Xiao

This paper shows our submission on the second automatic simultaneous translation workshop at NAACL2021. We participate in all the two directions of Chinese-to-English translation, Chinese audioEnglish text and Chinese textEnglish text. We do data filtering and model training techniques to get the best BLEU score and reduce the average lagging. We propose a two-stage simultaneous translation pipeline system which is composed of Quartznet and BPE-based transformer. We propose a competitive simultaneous translation system and achieves a BLEU score of 24.39 in the audio input track.

pdf bib
BSTC: A Large-Scale Chinese-English Speech Translation Dataset
Ruiqing Zhang | Xiyang Wang | Chuanqiang Zhang | Zhongjun He | Hua Wu | Zhi Li | Haifeng Wang | Ying Chen | Qinfei Li

This paper presents BSTC (Baidu Speech Translation Corpus), a large-scale Chinese-English speech translation dataset. This dataset is constructed based on a collection of licensed videos of talks or lectures, including about 68 hours of Mandarin data, their manual transcripts and translations into English, as well as automated transcripts by an automatic speech recognition (ASR) model. We have further asked three experienced interpreters to simultaneously interpret the testing talks in a mock conference setting. This corpus is expected to promote the research of automatic simultaneous translation as well as the development of practical systems. We have organized simultaneous translation tasks and used this corpus to evaluate automatic simultaneous translation systems.

pdf bib
Findings of the Second Workshop on Automatic Simultaneous Translation
Ruiqing Zhang | Chuanqiang Zhang | Zhongjun He | Hua Wu | Haifeng Wang

This paper presents the results of the shared task of the 2nd Workshop on Automatic Simultaneous Translation (AutoSimTrans). The task includes two tracks, one for text-to-text translation and one for speech-to-text, requiring participants to build systems to translate from either the source text or speech into the target text. Different from traditional machine translation, the AutoSimTrans shared task evaluates not only translation quality but also latency. We propose a metric “Monotonic Optimal Sequence” (MOS) considering both quality and latency to rank the submissions. We also discuss some important open issues in simultaneous translation.