Enhanced Simultaneous Machine Translation with Word-level Policies

Kang Kim, Hankyu Cho


Abstract
Recent years have seen remarkable advances in the field of Simultaneous Machine Translation (SiMT) due to the introduction of innovative policies that dictate whether to READ or WRITE at each step of the translation process. However, a common assumption in many existing studies is that operations are carried out at the subword level, even though the standard unit for input and output in most practical scenarios is typically at the word level. This paper demonstrates that policies devised and validated at the subword level are surpassed by those operating at the word level, which process multiple subwords to form a complete word in a single step. Additionally, we suggest a method to boost SiMT models using language models (LMs), wherein the proposed word-level policy plays a vital role in addressing the subword disparity between LMs and SiMT models. Code is available at https://github.com/xl8-ai/WordSiMT.
Anthology ID:
2023.findings-emnlp.1045
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15616–15634
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.1045
DOI:
10.18653/v1/2023.findings-emnlp.1045
Bibkey:
Cite (ACL):
Kang Kim and Hankyu Cho. 2023. Enhanced Simultaneous Machine Translation with Word-level Policies. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15616–15634, Singapore. Association for Computational Linguistics.
Cite (Informal):
Enhanced Simultaneous Machine Translation with Word-level Policies (Kim & Cho, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.1045.pdf