Improving Japanese-English Patent Claim Translation with Clause Segmentation Models based on Word Alignment

Masato Nishimura; Kosei Buma; Takehito Utsuro; Masaaki Nagata

Improving Japanese-English Patent Claim Translation with Clause Segmentation Models based on Word Alignment

Masato Nishimura, Kosei Buma, Takehito Utsuro, Masaaki Nagata

Abstract

In patent documents, patent claims represent a particularly important section as they define the scope of the claims. However, due to the length and unique formatting of these sentences, neural machine translation (NMT) systems are prone to translation errors, such as omissions and repetitions. To address these challenges, this study proposes a translation method that first segments the source sentences into multiple shorter clauses using a clause segmentation model tailored to facilitate translation. These segmented clauses are then translated using a clause translation model specialized for clause-level translation. Finally, the translated clauses are rearranged and edited into the final translation using a reordering and editing model. In addition, this study proposes a method for constructing clause-level parallel corpora required for training the clause segmentation and clause translation models. This method leverages word alignment tools to create clause-level data from sentence-level parallel corpora. Experimental results demonstrate that the proposed method achieves statistically significant improvements in BLEU scores compared to conventional NMT models. Furthermore, for sentences where conventional NMT models exhibit omissions and repetitions, the proposed method effectively suppresses these errors, enabling more accurate translations.

Anthology ID:: 2025.mtsummit-1.25
Volume:: Proceedings of Machine Translation Summit XX: Volume 1
Month:: June
Year:: 2025
Address:: Geneva, Switzerland
Editors:: Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Moniz, Sara Szoc
Venue:: MTSummit
SIG:
Publisher:: European Association for Machine Translation
Note:
Pages:: 333–343
Language:
URL:: https://aclanthology.org/2025.mtsummit-1.25/
DOI:
Bibkey:
Cite (ACL):: Masato Nishimura, Kosei Buma, Takehito Utsuro, and Masaaki Nagata. 2025. Improving Japanese-English Patent Claim Translation with Clause Segmentation Models based on Word Alignment. In Proceedings of Machine Translation Summit XX: Volume 1, pages 333–343, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):: Improving Japanese-English Patent Claim Translation with Clause Segmentation Models based on Word Alignment (Nishimura et al., MTSummit 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.mtsummit-1.25.pdf

PDF Cite Search Fix data