Target Language Monolingual Translation Memory based NMT by Cross-lingual Retrieval of Similar Translations and Reranking

Takuya Tamura, Xiaotian Wang, Takehito Utsuro, Masaaki Nagata


Abstract
Retrieve-edit-rerank is a text generation framework composed of three steps: retrieving for sentences using the input sentence as a query, generating multiple output sentence candidates, and selecting the final output sentence from these candidates. This simple approach has outperformed other existing and more complex methods. This paper focuses on the retrieving and the reranking steps. In the retrieving step, we propose retrieving similar target language sentences from a target language monolingual translation memory using language-independent sentence embeddings generated by mSBERT or LaBSE. We demonstrate that this approach significantly outperforms existing methods that use monolingual inter-sentence similarity measures such as edit distance, which is only applicable to a parallel translation memory. In the reranking step, we propose a new reranking score for selecting the best sentences, which considers both the log-likelihood of each candidate and the sentence embeddings based similarity between the input and the candidate. We evaluated the proposed method for English-to-Japanese translation on the ASPEC and English-to-French translation on the EU Bookshop Corpus (EUBC). The proposed method significantly exceeded the baseline in BLEU score, especially observing a 1.4-point improvement in the EUBC dataset over the original Retrieve-Edit-Rerank method.
Anthology ID:
2023.mtsummit-research.26
Volume:
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track
Month:
September
Year:
2023
Address:
Macau SAR, China
Editors:
Masao Utiyama, Rui Wang
Venue:
MTSummit
SIG:
Publisher:
Asia-Pacific Association for Machine Translation
Note:
Pages:
313–323
Language:
URL:
https://aclanthology.org/2023.mtsummit-research.26
DOI:
Bibkey:
Cite (ACL):
Takuya Tamura, Xiaotian Wang, Takehito Utsuro, and Masaaki Nagata. 2023. Target Language Monolingual Translation Memory based NMT by Cross-lingual Retrieval of Similar Translations and Reranking. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, pages 313–323, Macau SAR, China. Asia-Pacific Association for Machine Translation.
Cite (Informal):
Target Language Monolingual Translation Memory based NMT by Cross-lingual Retrieval of Similar Translations and Reranking (Tamura et al., MTSummit 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.mtsummit-research.26.pdf