Isao Goto

2026

Probabilistic Bilingual Subword Segmentation with Latent Subword Alignment
Shoto Nishida | Daiki Matsui | Takashi Ninomiya | Isao Goto | Akihiro Tamura
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

This study proposes a method for learning subword correspondences in parallel sentence pairs using the EM algorithm. Conventional neural machine translation typically employs subword segmentation models trained. However, since existing methods do not consider parallel relationships, inconsistencies in word segmentation between source and target languages may hinder translation model training. Our approach leverages direct modeling of subword correspondences in parallel corpora, thereby improving segmentation consistency across languages. Experiments across multiple machine translation tasks confirm that our proposed method improves translation accuracy for many tasks.

pdf bib abs

Domain Adaptation of Image Encoder for Multimodal Manga Translation
Kota Manabe | Tomoyuki Kajiwara | Takashi Ninomiya | Isao Goto | Shonosuke Ishiwatari | Hiroshi Noji
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

The objective of this paper is to enhance machine translation for manga (Japanese comics) by developing and employing an image encoder that is capable of more accurately comprehending its visual context. Conventional manga machine translation systems have faced the challenge of lacking sufficient manga comprehension capabilities when utilizing image information. To address this issue, we propose a domain-adapted image encoder training method for manga. The proposed method involves training encoders to acquire visual features that consider the structural and sequential characteristics of the manga. This approach draws upon a technique that has proven to be highly effective in training language models. The image encoders trained by the proposed methods are used as visual processors in a multimodal machine translation model, and they are evaluated in a Japanese-English translation task. The experimental results demonstrate that the proposed method enhances the performance metrics for translation evaluation, such as BLEU and xCOMET, in comparison to the conventional method.

2025

pdf bib

Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025)
Takashi Tsunakawa | Katsuhito Sudoh | Isao Goto
Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025)

pdf bib

Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025)
Toshiaki Nakazawa | Isao Goto
Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025)

pdf bib abs

This paper presents the results and findings of the first shared task of translating patent claims. We provide training, development, and test data for participants and perform human evaluation of the submitted translations. This time, 2 teams submitted their translation results. Our analysis of the human-annotated translation errors revealed not only general, domain-independent errors but also errors specific to patent translation. We also found that the human annotation itself exhibited some serious issues. In this paper, we report on these findings.

pdf bib abs

Ehime-U System with Judge and Refinement, Specialized Prompting, and Few-shot for the Patent Claim Translation Task at WAT 2025
Taishi Edamatsu | Isao Goto | Takashi Ninomiya
Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025)

The Ehime University team participated inthe Japanese-to-English Patent Claim Trans-lation Task at WAT 2025. We experimentedwith (i) Judge and Refinement, (ii) SpecializedPrompting, and (iii) Few-Shot approaches, us-ing GPT-5 as the underlying LLM. Evaluationbased on the LLM-as-a-Judge framework con-firmed improvements for (i), while (ii) and (iii)showed no significant effects.

pdf bib abs

Enriching the Low-Resource Neural Machine Translation with Large Language Model
Sachin Giri | Takashi Ninomiya | Isao Goto
The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Improving the performance of neural machine translation for low-resource languages is challenging due to the limited availability of parallel corpora. However, recently available Large Language Models (LLM) have demonstrated superior performance in various natural language processing tasks, including translation. In this work, we propose to incorporate an LLM into a Machine Translation (MT) model as a prior distribution to leverage its translation capabilities. The LLM acts as a teacher, instructing the student MT model about the target language. We conducted an experiment in four language pairs: English ⇔ German and English ⇔ Hindi. This resulted in improved BLEU and COMET scores in a low-resource setting.

2024

pdf bib abs

Findings of the WMT 2024 Shared Task on Non-Repetitive Translation
Kazutaka Kinugawa | Hideya Mino | Isao Goto | Naoto Shirai
Proceedings of the Ninth Conference on Machine Translation

The repetition of words in an English sentence can create a monotonous or awkward impression. In such cases, repetition should be avoided appropriately. To evaluate the performance of machine translation (MT) systems in avoiding such repetition and outputting more polished translations, we presented the shared task of controlling the lexical choice of MT systems. From Japanese–English parallel news articles, we collected several hundred sentence pairs in which the source sentences containing repeated words were translated in a style that avoided repetition. Participants were required to encourage the MT system to output tokens in a non-repetitive manner while maintaining translation quality. We conducted human and automatic evaluations of systems submitted by two teams based on an encoder-decoder Transformer and a large language model, respectively. From the experimental results and analysis, we report a series of findings on this task.

pdf bib

Proceedings of the Eleventh Workshop on Asian Translation (WAT 2024)
Toshiaki Nakazawa | Isao Goto
Proceedings of the Eleventh Workshop on Asian Translation (WAT 2024)

2023

pdf bib

pdf bib abs

This paper presents the results of the shared tasks from the 10th workshop on Asian translation (WAT2023). For the WAT2023, 2 teams submitted their translation results for the human evaluation. We also accepted 1 research paper. About 40 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2022

pdf bib abs

This paper presents the results of the shared tasks from the 9th workshop on Asian translation (WAT2022). For the WAT2022, 8 teams submitted their translation results for the human evaluation. We also accepted 4 research papers. About 300 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2021

pdf bib abs

This paper describes the system of our team (NHK) for the WAT 2021 Japanese-English restricted machine translation task. In this task, the aim is to improve quality while maintaining consistent terminology for scientific paper translation. This task has a unique feature, where some words in a target sentence are given in addition to a source sentence. In this paper, we use a lexically-constrained neural machine translation (NMT), which concatenates the source sentence and constrained words with a special token to input them into the encoder of NMT. The key to the successful lexically-constrained NMT is the way to extract constraints from a target sentence of training data. We propose two extraction methods: proper-noun constraint and mistranslated-word constraint. These two methods consider the importance of words and fallibility of NMT, respectively. The evaluation results demonstrate the effectiveness of our lexical-constraint method.

This paper presents the results of the shared tasks from the 8th workshop on Asian translation (WAT2021). For the WAT2021, 28 teams participated in the shared tasks and 24 teams submitted their translation results for the human evaluation. We also accepted 5 research papers. About 2,100 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2020

pdf bib abs

In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.

pdf bib abs

Effective Use of Target-side Context for Neural Machine Translation
Hideya Mino | Hitoshi Ito | Isao Goto | Ichiro Yamada | Takenobu Tokunaga
Proceedings of the 28th International Conference on Computational Linguistics

pdf bib abs

Neural Machine Translation Using Extracted Context Based on Deep Analysis for the Japanese-English Newswire Task at WAT 2020
Isao Goto | Hideya Mino | Hitoshi Ito | Kazutaka Kinugawa | Ichiro Yamada | Hideki Tanaka
Proceedings of the 7th Workshop on Asian Translation

This paper describes the system of the NHK-NES team for the WAT 2020 Japanese–English newswire task. There are two main problems in Japanese-English news translation: translation of dropped subjects and compatibility between equivalent translations and English news-style outputs. We address these problems by extracting subjects from the context based on predicate-argument structures and using them as additional inputs, and constructing parallel Japanese-English news sentences equivalently translated from English news sentences. The evaluation results confirm the effectiveness of our context-utilization method.

pdf bib

pdf bib abs

This paper presents the results of the shared tasks from the 7th workshop on Asian translation (WAT2020). For the WAT2020, 20 teams participated in the shared tasks and 14 teams submitted their translation results for the human evaluation. We also received 12 research paper submissions out of which 7 were accepted. About 500 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2019

pdf bib

pdf bib abs

Neural Machine Translation System using a Content-equivalently Translated Parallel Corpus for the Newswire Translation Tasks at WAT 2019
Hideya Mino | Hitoshi Ito | Isao Goto | Ichiro Yamada | Hideki Tanaka | Takenobu Tokunaga
Proceedings of the 6th Workshop on Asian Translation

This paper describes NHK and NHK Engineering System (NHK-ES)’s submission to the newswire translation tasks of WAT 2019 in both directions of Japanese→English and English→Japanese. In addition to the JIJI Corpus that was officially provided by the task organizer, we developed a corpus of 0.22M sentence pairs by manually, translating Japanese news sentences into English content- equivalently. The content-equivalent corpus was effective for improving translation quality, and our systems achieved the best human evaluation scores in the newswire translation tasks at WAT 2019.

pdf bib abs

This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta↔En mixed domain subtasks and Ru↔Ja news commentary translation task. For the WAT2019, 25 teams participated in the shared tasks. We also received 10 research paper submissions out of which 61 were accepted. About 400 translation results were submitted to the automatic evaluation server, and selected submis- sions were manually evaluated.

2018

pdf bib

2017

pdf bib abs

This paper presents the results of the shared tasks from the 4th workshop on Asian translation (WAT2017) including J↔E, J↔C scientific paper translation subtasks, C↔J, K↔J, E↔J patent translation subtasks, H↔E mixed domain subtasks, J↔E newswire subtasks and J↔E recipe subtasks. For the WAT2017, 12 institutions participated in the shared tasks. About 300 translation results have been submitted to the automatic evaluation server, and selected submissions were manually evaluated.

pdf bib

Proceedings of the 4th Workshop on Asian Translation (WAT2017)
Toshiaki Nakazawa | Isao Goto
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

pdf bib abs

Detecting Untranslated Content for Neural Machine Translation
Isao Goto | Hideki Tanaka
Proceedings of the First Workshop on Neural Machine Translation

Despite its promise, neural machine translation (NMT) has a serious problem in that source content may be mistakenly left untranslated. The ability to detect untranslated content is important for the practical use of NMT. We evaluate two types of probability with which to detect untranslated content: the cumulative attention (ATN) probability and back translation (BT) probability from the target sentence to the source sentence. Experiments on detecting untranslated content in Japanese-English patent translations show that ATN and BT are each more effective than random choice, BT is more effective than ATN, and the combination of the two provides further improvements. We also confirmed the effectiveness of using ATN and BT to rerank the n-best NMT outputs.

2016

pdf bib

pdf bib abs

This paper presents the results of the shared tasks from the 3rd workshop on Asian translation (WAT2016) including J ↔ E, J ↔ C scientific paper translation subtasks, C ↔ J, K ↔ J, E ↔ J patent translation subtasks, I ↔ E newswire subtasks and H ↔ E, H ↔ J mixed domain subtasks. For the WAT2016, 15 institutions participated in the shared tasks. About 500 translation results have been submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2015

pdf bib

The “News Web Easy” news service as a resource for teaching and learning Japanese: An assessment of the comprehension difficulty of Japanese sentence-end expressions
Hideki Tanaka | Tadashi Kumano | Isao Goto
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

pdf bib

Japanese news simplification: tak design, data set construction, and analysis of simplified text
Isao Goto | Hideki Tanaka | Tadashi Kumano
Proceedings of Machine Translation Summit XV: Papers

bib

pdf bib

This paper describes a Multi-language Translation Example Browser, a type of translation memory system. The system is able to retrieve translation examples from bilingual news databases, which consist of news transcripts of past broadcasts. We put a Japanese-English system to practical use and undertook trial operations of a system of eight language-pairs.

pdf bib abs

Transliteration considering context information based on the maximum entropy method
Isao Goto | Naoto Kato | Noriyoshi Uratani | Terumasa Ehara
Proceedings of Machine Translation Summit IX: Papers

This paper proposes a method of automatic transliteration from English to Japanese words. Our method successfully transliterates an English word not registered in any bilingual or pronunciation dictionaries by converting each partial letters in the English word into Japanese katakana characters. In such transliteration, identical letters occurring in different English words must often be converted into different katakana. To produce an adequate transliteration, the proposed method considers chunking of alphabetic letters of an English word into conversion units and considers English and Japanese context information simultaneously to calculate the plausibility of conversion. We have confirmed experimentally that the proposed method improves the conversion accuracy by 63% compared to a simple method that ignores the plausibility of chunking and contextual information.

Co-authors

Venues

NGT1

WMT1

Isao Goto

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2004

2003

Co-authors

Venues