Masaaki Nagata

2025

This paper presents the results and findings of the first shared task of translating patent claims. We provide training, development, and test data for participants and perform human evaluation of the submitted translations. This time, 2 teams submitted their translation results. Our analysis of the human-annotated translation errors revealed not only general, domain-independent errors but also errors specific to patent translation. We also found that the human annotation itself exhibited some serious issues. In this paper, we report on these findings.

pdf bib abs

Case-Based Decision-Theoretic Decoding with Quality Memories
Hiroyuki Deguchi | Masaaki Nagata
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Minimum Bayes risk (MBR) decoding is a decision rule of text generation, which selects the hypothesis that maximizes the expected utility and robustly generates higher-quality texts than maximum a posteriori (MAP) decoding.However, it depends on sample texts drawn from the text generation model; thus, it is difficult to find a hypothesis that correctly captures the knowledge or information of out-of-domain.To tackle this issue, we propose case-based decision-theoretic (CBDT) decoding, another method to estimate the expected utility using examples of domain data.CBDT decoding not only generates higher-quality texts than MAP decoding, but also the combination of MBR and CBDT decoding outperformed MBR decoding in seven domain De–En and Ja↔En translation tasks and image captioning tasks on MSCOCO and nocaps datasets.

pdf bib abs

Improving Word Alignment Using Semi-Supervised Learning
Zhongtao Miao | Qiyu Wu | Masaaki Nagata | Yoshimasa Tsuruoka
Findings of the Association for Computational Linguistics: ACL 2025

Word alignment plays a crucial role in various natural language processing tasks, such as serving as cross-lingual signals for sentence embedding, reducing hallucination and omission in machine translation, and facilitating the construction of training data for simultaneous speech translation.Current state-of-the-art approaches usually rely on: (1) supervised data and large-scale weakly supervised data constructed from Wikipedia and (2) multilingual Transformer encoder-based models.However, we find that the current state-of-the-art encoder-based method, BinaryAlign, suffers from the issue of insufficient labeled data, and we further improve it with self-training with a small amount of parallel data. In addition, considering the impressive performance of multilingual large language models on many natural language processing tasks, we also explore the possibility of using these decoder-based large language models as word aligners. We observe that although fine-tuning large language models with labeled data produces acceptable results, augmenting the training with pseudo-labeled data further enhances model performance. Based on the findings, we propose a semi-supervised framework to improve the large language model-based word aligners. Experimental results demonstrate that the proposed method with a small amount of parallel data outperforms the current state-of-the-art method on various word alignment datasets.

This paper presents the results of the General Machine Translation Task organized as part of the 2025 Conference on Machine Translation (WMT). Participants were invited to build systems for any of 30 language pairs. For half of these pairs, we conducted a human evaluation on test sets spanning four to five different domains.We evaluated 60 systems in total: 36 submitted by participants and 24 for which we collected translations from large language models (LLMs) and popular online translation providers.This year, we focused on creating challenging test sets by developing a difficulty sampling technique and using more complex source data. We evaluated system outputs with professional annotators using the Error Span Annotation (ESA) protocol, except for two language pairs, for which we used Multidimensional Quality Metrics (MQM) instead.We continued the trend of increasingly moving towards document-level translation, providing the source texts as whole documents containing multiple paragraphs.

pdf bib abs

Two Step Automatic Post Editing of Patent Machine Translation based on Pre-trained Encoder Models and LLMs
Kosei Buma | Takehito Utsuro | Masaaki Nagata
The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

We study automatic post-editing for patent translation, where accuracy and traceability are critical, and propose a two-step pipeline that combines a multilingual encoder for token-level error detection with an LLM for targeted correction. As no word-level annotations exist for Japanese–English patents, we create supervised data by injecting synthetic errors into parallel patent sentences and fine-tune mBERT, XLM-RoBERTa, and mDeBERTa as detectors. In the second stage, GPT-4o is prompted to revise translations either freely or under a restricted policy that allows edits only on detector-marked spans. For error detection, evaluation on synthetic errors shows that encoder-based detectors outperform LLMs in both F1 and MCC. For error correction, tests on synthetic, repetition, and omission datasets demonstrate statistically significant BLEU gains over LLM methods for synthetic and repetition errors, while omission errors remain challenging. Overall, pairing compact encoders with an LLM enables more accurate and controllable post-editing for key patent error types, reducing unnecessary rewrites via restricted edits. Future work will focus on strengthening omission modeling to better detect and correct missing content.

pdf bib abs

Patent Claim Translation via Continual Pre-training of Large Language Models with Parallel Data
Haruto Azami | Minato Kondo | Takehito Utsuro | Masaaki Nagata
Proceedings of Machine Translation Summit XX: Volume 1

Recent advancements in large language models (LLMs) have enabled their application across various domains. However, in the field of patent translation, Transformer encoder-decoder based models remain the standard approach, and the potential of LLMs for translation tasks has not been thoroughly explored. In this study, we conducted patent claim translation using an LLM fine-tuned with parallel data through continual pre-training and supervised fine-tuning, following the methodology proposed by Guo et al. (2024) and Kondo et al. (2024). Comparative evaluation against the Transformer encoder-decoder based translations revealed that the LLM achieved high scores for both BLEU and COMET. This demonstrated improvements in addressing issues such as omissions and repetitions. Nonetheless, hallucination errors, which were not observed in the traditional models, occurred in some cases and negatively affected the translation quality. This study highlights the promise of LLMs for patent translation while identifying the challenges that warrant further investigation.

pdf bib abs

Improving Japanese-English Patent Claim Translation with Clause Segmentation Models based on Word Alignment
Masato Nishimura | Kosei Buma | Takehito Utsuro | Masaaki Nagata
Proceedings of Machine Translation Summit XX: Volume 1

In patent documents, patent claims represent a particularly important section as they define the scope of the claims. However, due to the length and unique formatting of these sentences, neural machine translation (NMT) systems are prone to translation errors, such as omissions and repetitions. To address these challenges, this study proposes a translation method that first segments the source sentences into multiple shorter clauses using a clause segmentation model tailored to facilitate translation. These segmented clauses are then translated using a clause translation model specialized for clause-level translation. Finally, the translated clauses are rearranged and edited into the final translation using a reordering and editing model. In addition, this study proposes a method for constructing clause-level parallel corpora required for training the clause segmentation and clause translation models. This method leverages word alignment tools to create clause-level data from sentence-level parallel corpora. Experimental results demonstrate that the proposed method achieves statistically significant improvements in BLEU scores compared to conventional NMT models. Furthermore, for sentences where conventional NMT models exhibit omissions and repetitions, the proposed method effectively suppresses these errors, enabling more accurate translations.

pdf bib abs

BiMax: Bidirectional MaxSim Score for Document-Level Alignment
Xiaotian Wang | Takehito Utsuro | Masaaki Nagata
Findings of the Association for Computational Linguistics: EMNLP 2025

Document alignment is necessary for the hierarchical mining, which aligns documents across source and target languages within the same web domain. Several high-precision sentence embedding-based methods have been developed, such as TK-PERT and Optimal Transport (OT). However, given the massive scale of web mining data, both accuracy and speed must be considered.In this paper, we propose a cross-lingual Bidirectional Maxsim score (BiMax) for computing doc-to-doc similarity,to improve efficiency compared to the OT method.Consequently, on the WMT16 bilingual document alignment task,BiMax attains accuracy comparable to OT with an approximate 100-fold speed increase.Meanwhile, we also conduct a comprehensive analysis to investigate the performance of current state-of-the-art multilingual sentence embedding models.

2024

pdf bib abs

Argument Mining as a Text-to-Text Generation Task
Masayuki Kawarada | Tsutomu Hirao | Wataru Uchida | Masaaki Nagata
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Argument Mining (AM) aims to uncover the argumentative structures within a text. Previous methods require several subtasks, such as span identification, component classification, and relation classification. Consequently, these methods need rule-based postprocessing to derive argumentative structures from the output of each subtask. This approach adds to the complexity of the model and expands the search space of the hyperparameters. To address this difficulty, we propose a simple yet strong method based on a text-to-text generation approach using a pretrained encoder-decoder language model. Our method simultaneously generates argumentatively annotated text for spans, components, and relations, eliminating the need for task-specific postprocessing and hyperparameter tuning. Furthermore, because it is a straightforward text-to-text generation method, we can easily adapt our approach to various types of argumentative structures.Experimental results demonstrate the effectiveness of our method, as it achieves state-of-the-art performance on three different types of benchmark datasets: the Argument-annotated Essays Corpus (AAEC), AbstRCT, and the Cornell eRulemaking Corpus (CDCP).

pdf bib abs

JaParaPat: A Large-Scale Japanese-English Parallel Patent Application Corpus
Masaaki Nagata | Makoto Morishita | Katsuki Chousa | Norihito Yasuda
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We constructed JaParaPat (Japanese-English Parallel Patent Application Corpus), a bilingual corpus of more than 300 million Japanese-English sentence pairs from patent applications published in Japan and the United States from 2000 to 2021. We obtained the publication of unexamined patent applications from the Japan Patent Office (JPO) and the United States Patent and Trademark Office (USPTO). We also obtained patent family information from the DOCDB, that is a bibliographic database maintained by the European Patent Office (EPO). We extracted approximately 1.4M Japanese-English document pairs, which are translations of each other based on the patent families, and extracted about 350M sentence pairs from the document pairs using a translation-based sentence alignment method whose initial translation model is bootstrapped from a dictionary-based sentence alignment. We experimentally improved the accuracy of the patent translations by 20 bleu points by adding more than 300M sentence pairs obtained from patent applications to 22M sentence pairs obtained from the web.

pdf bib abs

Detector–Corrector: Edit-Based Automatic Post Editing for Human Post Editing
Hiroyuki Deguchi | Masaaki Nagata | Taro Watanabe
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

Post-editing is crucial in the real world because neural machine translation (NMT) sometimes makes errors.Automatic post-editing (APE) attempts to correct the outputs of an MT model for better translation quality.However, many APE models are based on sequence generation, and thus their decisions are harder to interpret for actual users.In this paper, we propose “detector–corrector”, an edit-based post-editing model, which breaks the editing process into two steps, error detection and error correction.The detector model tags each MT output token whether it should be corrected and/or reordered while the corrector model generates corrected words for the spans identified as errors by the detector.Experiments on the WMT’20 English–German and English–Chinese APE tasks showed that our detector–corrector improved the translation edit rate (TER) compared to the previous edit-based model and a black-box sequence-to-sequence APE model, in addition, our model is more explainable because it is based on edit operations.

pdf bib abs

Word Alignment as Preference for Machine Translation
Qiyu Wu | Masaaki Nagata | Zhongtao Miao | Yoshimasa Tsuruoka
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The problem of hallucination and omission, a long-standing problem in machine translation (MT), is more pronounced when a large language model (LLM) is used in MT because an LLM itself is susceptible to these phenomena. In this work, we mitigate the problem in an LLM-based MT model by guiding it to better word alignment. We first study the correlation between word alignment and the phenomena of hallucination and omission in MT. Then we propose to utilize word alignment as preference to optimize the LLM-based MT model. The preference data are constructed by selecting chosen and rejected translations from multiple MT tools. Subsequently, direct preference optimization is used to optimize the LLM-based model towards the preference signal. Given the absence of evaluators specifically designed for hallucination and omission in MT, we further propose selecting hard instances and utilizing GPT-4 to directly evaluate the performance of the models in mitigating these issues. We verify the rationality of these designed evaluation methods by experiments, followed by extensive results demonstrating the effectiveness of word alignment-based preference optimization to mitigate hallucination and omission. On the other hand, although it shows promise in mitigating hallucination and omission, the overall performance of MT in different language directions remains mixed, with slight increases in BLEU and decreases in COMET.

pdf bib abs

Document Alignment based on Overlapping Fixed-Length Segments
Xiaotian Wang | Takehito Utsuro | Masaaki Nagata
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Acquiring large-scale parallel corpora is crucial for NLP tasks such as Neural Machine Translation, and web crawling has become a popular methodology for this purpose. Previous studies have been conducted based on sentence-based segmentation (SBS) when aligning documents in various languages which are obtained through web crawling. Among them, the TK-PERT method (Thompson and Koehn, 2020) achieved state-of-the-art results and addressed the boilerplate text in web crawling data well through a down-weighting approach. However, there remains a problem with how to handle long-text encoding better. Thus, we introduce the strategy of Overlapping Fixed-Length Segmentation (OFLS) in place of SBS, and observe a pronounced enhancement when performing the same approach for document alignment. In this paper, we compare the SBS and OFLS using three previous methods, Mean-Pool, TK-PERT (Thompson and Koehn, 2020), and Optimal Transport (Clark et al., 2019; El-Kishky and Guzman, 2020), on the WMT16 document alignment shared task for French-English, as well as on our self-established Japanese-English dataset MnRN. As a result, for the WMT16 task, various SBS based methods showed an increase in recall by 1% to 10% after reproduction with OFLS. For MnRN data, OFLS demonstrated notable accuracy improvements and exhibited faster document embedding speed.

This overview paper presents the results of the General Machine Translation Task organised as part of the 2024 Conference on Machine Translation (WMT). In the general MT task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting of three to five different domains. In addition to participating systems, we collected translations from 8 different large language models (LLMs) and 4 online translation providers. We evaluate system outputs with professional human annotators using a new protocol called Error Span Annotations (ESA).

pdf bib abs

Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data
Minato Kondo | Takehito Utsuro | Masaaki Nagata
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

In this paper, we propose a two-phase training approach where pre-trained large language models are continually pre-trained on parallel data and then supervised fine-tuned with a small amount of high-quality parallel data. To investigate the effectiveness of our proposed approach, we conducted continual pre-training with a 3.8B-parameter model and parallel data across eight different formats. We evaluate these methods on thirteen test sets for Japanese-to-English and English-to-Japanese translation. The results demonstrate that when utilizing parallel data in continual pre-training, it is essential to alternate between source and target sentences. Additionally, we demonstrated that the translation accuracy improves only for translation directions where the order of source and target sentences aligns between continual pre-training data and inference. In addition, we demonstrate that the LLM-based translation model is more robust in translating spoken language and achieves higher accuracy with less training data compared to supervised encoder-decoder models. We also show that the highest accuracy is achieved when the data for continual pre-training consists of interleaved source and target sentences and when tags are added to the source sentences.

Masaaki Nagata

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2006

2005

2004

2003

2001

2000

1999

1998

1997

1996

1994

1992

Co-authors

Venues