Hiroyuki Deguchi - ACL Anthology

Hiroyuki Deguchi

2026

Hacking Neural Evaluation Metrics with Single Hub Text
Hiroyuki Deguchi | Katsuki Chousa | Yusuke Sakai
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

Strongly human-correlated evaluation metrics serve as an essential compass for the development and improvement of generation models and must be highly reliable and robust. Recent embedding-based neural text evaluation metrics, such as COMET for translation tasks, are widely used in both research and development fields. However, there is no guarantee that they yield reliable evaluation results due to the black-box nature of neural networks. To raise concerns about the reliability and safety of such metrics, we propose a method for finding a single adversarial text in the discrete space that is consistently evaluated as high-quality, regardless of the test cases, to identify the vulnerabilities in evaluation metrics. The single hub text found with our method achieved 79.1 COMET% and 67.8 COMET% in the WMT’24 English-to-Japanese (En–Ja) and English-to-German (En–De) translation tasks, respectively, outperforming translations generated individually for each source sentence by using M2M100, a general translation model. Furthermore, we also confirmed that the hub text found with our method generalizes across multiple language pairs such as Ja–En and De–En.

TableMBR: Minimum Bayes Risk Table Generation Based on Structural Consistency
Daiki Yoshida | Hiroyuki Deguchi | Yusuke Sakai | Hidetaka Kamigaito | Taro Watanabe
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

The text-to-table task aims to generate structured data in tabular formats from unstructured text. While the integration of large language models (LLMs) has significantly enhanced the comprehensiveness and flexibility of generation, challenges regarding inconsistent output quality persist, such as the inclusion of redundant information and numerical inaccuracies. We propose TableMBR, a robust table generation method that maintains structural consistency through minimum Bayes risk (MBR) decoding. Experimental results showed that TableMBR outperforms the baseline, achieving relative improvements of up to 15% in F1 score on Rotowire and 23% in accuracy on LiveSum.

One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness
Hiroyuki Deguchi | Katsuki Chousa | Yusuke Sakai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The hubness problem, in which hub embeddings are close to many unrelated examples, occurs often in high-dimensional embedding spaces and may pose a practical threat for purposes such as information retrieval and automatic evaluation metrics.In particular, since cross-modal similarity between text and images cannot be calculated by direct comparisons, such as string matching, cross-modal encoders that project different modalities into a shared space are helpful for various cross-modal applications, and thus, the existence of hubs may pose practical threats.To reveal the vulnerabilities of cross-modal encoders, we propose a method for identifying the hub embedding and its corresponding hub text.Experiments on image captioning evaluation in MSCOCO and nocaps along with image-to-text retrieval tasks in MSCOCO and Flickr30k showed that our method can identify a single hub text that unreasonably achieves comparable or higher similarity scores than human-written reference captions in many images, thereby revealing the vulnerabilities in cross-modal encoders.

2025

NTTSU at WMT2025 General Translation Task
Zhang Yin | Hiroyuki Deguchi | Haruto Azami | Guanyu Ouyang | Kosei Buma | Yingyi Fu | Katsuki Chousa | Takehito Utsuro
Proceedings of the Tenth Conference on Machine Translation

This paper presents the submission of NTTSU for the constrained track of the English–Japanese and Japanese–Chinese at the WMT2025 general translation task.For each translation direction, we build translation models from a large language model by combining continual pretraining, supervised fine-tuning, and preference optimization based on the translation quality and adequacy.We finally generate translations via context-aware MBR decoding to maximize translation quality and document-level consistency.

Agreement-Constrained Probabilistic Minimum Bayes Risk Decoding
Koki Natsumi | Hiroyuki Deguchi | Yusuke Sakai | Hidetaka Kamigaito | Taro Watanabe
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Minimum Bayes risk (MBR) decoding generates high-quality translations by maximizing the expected utility of output candidates, but it evaluates all pairwise scores over the candidate set; hence, it takes quadratic time with respect to the number of candidates. To reduce the number of utility function calls, probabilistic MBR (PMBR) decoding partially evaluates quality scores using sampled pairs of candidates and completes the missing scores with a matrix completion algorithm. Nevertheless, it degrades the translation quality as the number of utility function calls is reduced. Therefore, to improve the trade-off between quality and cost, we propose agreement-constrained PMBR (AC-PMBR) decoding, which leverages a knowledge distilled model to guide the completion of the score matrix. Our AC-PMBR decoding improved approximation errors of matrix completion by up to 3 times and achieved higher translation quality compared with PMBR decoding at a comparable computational cost on the WMT’23 En↔De translation tasks.

Long-Tail Crisis in Nearest Neighbor Language Models
Yuto Nishida | Makoto Morishita | Hiroyuki Deguchi | Hidetaka Kamigaito | Taro Watanabe
Findings of the Association for Computational Linguistics: NAACL 2025

The k-nearest-neighbor language model (kNN-LM), one of the retrieval-augmented language models, improves the perplexity for given text by directly accessing a large datastore built from any text data during inference.A widely held hypothesis for the success of kNN-LM is that its explicit memory, i.e., the datastore, enhances predictions for long-tail phenomena.However, prior works have primarily shown its ability to retrieve long-tail contexts, leaving the model’s performance remain underexplored in estimating the probabilities of long-tail target tokens during inference.In this paper, we investigate the behavior of kNN-LM on low-frequency tokens, examining prediction probability, retrieval accuracy, and token distribution in the datastore.Our experimental results reveal that kNN-LM does not improve prediction performance for low-frequency tokens but mainly benefits high-frequency tokens regardless of long-tail contexts in the datastore.

Case-Based Decision-Theoretic Decoding with Quality Memories
Hiroyuki Deguchi | Masaaki Nagata
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Minimum Bayes risk (MBR) decoding is a decision rule of text generation, which selects the hypothesis that maximizes the expected utility and robustly generates higher-quality texts than maximum a posteriori (MAP) decoding.However, it depends on sample texts drawn from the text generation model; thus, it is difficult to find a hypothesis that correctly captures the knowledge or information of out-of-domain.To tackle this issue, we propose case-based decision-theoretic (CBDT) decoding, another method to estimate the expected utility using examples of domain data.CBDT decoding not only generates higher-quality texts than MAP decoding, but also the combination of MBR and CBDT decoding outperformed MBR decoding in seven domain De–En and Ja↔En translation tasks and image captioning tasks on MSCOCO and nocaps datasets.

Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding
Hidetaka Kamigaito | Hiroyuki Deguchi | Yusuke Sakai | Katsuhiko Hayashi | Taro Watanabe
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Inference methods play an important role in eliciting the performance of large language models (LLMs). Currently, LLMs use inference methods utilizing generated multiple samples, which can be derived from Minimum Bayes Risk (MBR) Decoding. Previous studies have conducted empirical analyses to clarify the improvements in generation performance achieved by MBR decoding and have reported various observations. However, the theoretical underpinnings of these findings remain uncertain. To address this, we offer a new theoretical interpretation of MBR decoding from the perspective of bias–diversity decomposition. In this interpretation, the error in the quality estimation of hypotheses by MBR decoding is decomposed into two main factors: bias, which considers the closeness between the utility function and human evaluation, and diversity, which represents the variability in the quality estimation of the utility function. The theoretical analysis reveals the difficulty of simultaneously improving bias and diversity, confirming the validity of enhancing MBR decoding performance by increasing diversity. Furthermore, we reveal that diversity can explain one aspect of inference scaling laws that describe performance improvement by increasing sample size. Moreover, experiments across multiple NLP tasks yielded results consistent with these theoretical characteristics. Our code is available at https://github.com/naist-nlp/mbr-bias-diversity.

2024

Document-level Translation with LLM Reranking: Team-J at WMT 2024 General Translation Task
Keito Kudo | Hiroyuki Deguchi | Makoto Morishita | Ryo Fujii | Takumi Ito | Shintaro Ozaki | Koki Natsumi | Kai Sato | Kazuki Yano | Ryosuke Takahashi | Subaru Kimura | Tomomasa Hara | Yusuke Sakai | Jun Suzuki
Proceedings of the Ninth Conference on Machine Translation

We participated in the constrained track for English-Japanese and Japanese-Chinese translations at the WMT 2024 General Machine Translation Task. Our approach was to generate a large number of sentence-level translation candidates and select the most probable translation using minimum Bayes risk (MBR) decoding and document-level large language model (LLM) re-ranking. We first generated hundreds of translation candidates from multiple translation models and retained the top 30 candidates using MBR decoding. In addition, we continually pre-trained LLMs on the target language corpora to leverage document-level information. We utilized LLMs to select the most probable sentence sequentially in context from the beginning of the document.

Centroid-Based Efficient Minimum Bayes Risk Decoding
Hiroyuki Deguchi | Yusuke Sakai | Hidetaka Kamigaito | Taro Watanabe | Hideki Tanaka | Masao Utiyama
Findings of the Association for Computational Linguistics: ACL 2024

Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET, a neural metric that has a high correlation with human evaluation.However, MBR decoding requires quadratic time since it computes the expected score between a translation hypothesis and all reference translations.We propose centroid-based MBR (CBMBR) decoding to improve the speed of MBR decoding.Our method clusters the reference translations in the feature space, and then calculates the score using the centroids of each cluster.The experimental results show that our CBMBR not only improved the decoding speed of the expected score calculation 5.7 times, but also outperformed vanilla MBR decoding in translation quality by up to 0.5 COMET in the WMT’22 En↔Ja, En↔De, En↔Zh, and WMT’23 En↔Ja translation tasks.

mbrs: A Library for Minimum Bayes Risk Decoding
Hiroyuki Deguchi | Yusuke Sakai | Hidetaka Kamigaito | Taro Watanabe
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Detector–Corrector: Edit-Based Automatic Post Editing for Human Post Editing
Hiroyuki Deguchi | Masaaki Nagata | Taro Watanabe
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

Post-editing is crucial in the real world because neural machine translation (NMT) sometimes makes errors.Automatic post-editing (APE) attempts to correct the outputs of an MT model for better translation quality.However, many APE models are based on sequence generation, and thus their decisions are harder to interpret for actual users.In this paper, we propose “detector–corrector”, an edit-based post-editing model, which breaks the editing process into two steps, error detection and error correction.The detector model tags each MT output token whether it should be corrected and/or reordered while the corrector model generates corrected words for the spans identified as errors by the detector.Experiments on the WMT’20 English–German and English–Chinese APE tasks showed that our detector–corrector improved the translation edit rate (TER) compared to the previous edit-based model and a black-box sequence-to-sequence APE model, in addition, our model is more explainable because it is based on edit operations.

2023

NAIST-NICT WMT’23 General MT Task Submission
Hiroyuki Deguchi | Kenji Imamura | Yuto Nishida | Yusuke Sakai | Justin Vasselli | Taro Watanabe
Proceedings of the Eighth Conference on Machine Translation

In this paper, we describe our NAIST-NICT submission to the WMT’23 English ↔ Japanese general machine translation task. Our system generates diverse translation candidates and reranks them using a two-stage reranking system to find the best translation. First, we generated 50 candidates each from 18 translation methods using a variety of techniques to increase the diversity of the translation candidates. We trained seven models per language direction using various combinations of hyperparameters. From these models we used various decoding algorithms, ensembling the models, and using kNN-MT (Khandelwal et al., 2021). We processed the 900 translation candidates through a two-stage reranking system to find the most promising candidate. In the first step, we compared 50 candidates from each translation method using DrNMT (Lee et al., 2021) and returned the candidate with the best score. We ranked the final 18 candidates using COMET-MBR (Fernandes et al., 2022) and returned the best score as the system output. We found that generating diverse translation candidates improved translation quality using the well-designed reranker model.

Subset Retrieval Nearest Neighbor Machine Translation
Hiroyuki Deguchi | Taro Watanabe | Yusuke Matsui | Masao Utiyama | Hideki Tanaka | Eiichiro Sumita
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

k-nearest-neighbor machine translation (kNN-MT) (Khandelwal et al., 2021) boosts the translation performance of trained neural machine translation (NMT) models by incorporating example-search into the decoding algorithm. However, decoding is seriously time-consuming, i.e., roughly 100 to 1,000 times slower than standard NMT, because neighbor tokens are retrieved from all target tokens of parallel data in each timestep. In this paper, we propose “Subset kNN-MT”, which improves the decoding speed of kNN-MT by two methods: (1) retrieving neighbor target tokens from a subset that is the set of neighbor sentences of the input sentence, not from all sentences, and (2) efficient distance computation technique that is suitable for subset neighbor search using a look-up table. Our proposed method achieved a speed-up of up to 132.2 times and an improvement in BLEU score of up to 1.6 compared with kNN-MT in the WMT’19 De-En translation task and the domain adaptation tasks in De-En and En-Ja.

2022

NAIST-NICT-TIT WMT22 General MT Task Submission
Hiroyuki Deguchi | Kenji Imamura | Masahiro Kaneko | Yuto Nishida | Yusuke Sakai | Justin Vasselli | Huy Hien Vu | Taro Watanabe
Proceedings of the Seventh Conference on Machine Translation (WMT)

In this paper, we describe our NAIST-NICT-TIT submission to the WMT22 general machine translation task. We participated in this task for the English ↔ Japanese language pair. Our system is characterized as an ensemble of Transformer big models, k-nearest-neighbor machine translation (kNN-MT) (Khandelwal et al., 2021), and reranking.In our translation system, we construct the datastore for kNN-MT from back-translated monolingual data and integrate kNN-MT into the ensemble model. We designed a reranking system to select a translation from the n-best translation candidates generated by the translation system. We also use a context-aware model to improve the document-level consistency of the translation.

2021

Synchronous Syntactic Attention for Transformer Neural Machine Translation
Hiroyuki Deguchi | Akihiro Tamura | Takashi Ninomiya
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

This paper proposes a novel attention mechanism for Transformer Neural Machine Translation, “Synchronous Syntactic Attention,” inspired by synchronous dependency grammars. The mechanism synchronizes source-side and target-side syntactic self-attentions by minimizing the difference between target-side self-attentions and the source-side self-attentions mapped by the encoder-decoder attention matrix. The experiments show that the proposed method improves the translation performance on WMT14 En-De, WMT16 En-Ro, and ASPEC Ja-En (up to +0.38 points in BLEU).

2020

Bilingual Subword Segmentation for Neural Machine Translation
Hiroyuki Deguchi | Masao Utiyama | Akihiro Tamura | Takashi Ninomiya | Eiichiro Sumita
Proceedings of the 28th International Conference on Computational Linguistics

This paper proposed a new subword segmentation method for neural machine translation, “Bilingual Subword Segmentation,” which tokenizes sentences to minimize the difference between the number of subword units in a sentence and that of its translation. While existing subword segmentation methods tokenize a sentence without considering its translation, the proposed method tokenizes a sentence by using subword units induced from bilingual sentences; this method could be more favorable to machine translation. Evaluations on WAT Asian Scientific Paper Excerpt Corpus (ASPEC) English-to-Japanese and Japanese-to-English translation tasks and WMT14 English-to-German and German-to-English translation tasks show that our bilingual subword segmentation improves the performance of Transformer neural machine translation (up to +0.81 BLEU).

2019

Dependency-Based Self-Attention for Transformer NMT
Hiroyuki Deguchi | Akihiro Tamura | Takashi Ninomiya
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper, we propose a new Transformer neural machine translation (NMT) model that incorporates dependency relations into self-attention on both source and target sides, dependency-based self-attention. The dependency-based self-attention is trained to attend to the modifiee for each token under constraints based on the dependency relations, inspired by Linguistically-Informed Self-Attention (LISA). While LISA is originally proposed for Transformer encoder for semantic role labeling, this paper extends LISA to Transformer NMT by masking future information on words in the decoder-side dependency-based self-attention. Additionally, our dependency-based self-attention operates at sub-word units created by byte pair encoding. The experiments show that our model improves 1.0 BLEU points over the baseline model on the WAT’18 Asian Scientific Paper Excerpt Corpus Japanese-to-English translation task.

Co-authors

Venues