Takehito Utsuro

2026

Better Generalizing to Unseen Concepts: An Evaluation Framework and An LLM-Based Auto-Labeled Pipeline for Biomedical Concept Recognition
Shanshan Liu | Noriki Nishida | Fei Cheng | Narumi Tokunaga | Rumana Ferdous Munne | Yuki Yamagata | Kouji Kozaki | Takehito Utsuro | Yuji Matsumoto
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Generalization to unseen concepts is a central challenge due to the scarcity of human annotations in Mention-agnostic Biomedical Concept Recognition (MA-BCR). This work makes two key contributions to systematically address this issue. First, we propose an evaluation framework built on hierarchical concept indices and novel metrics to measure generalization. Second, we explore LLM-based Auto-Labeled Data (ALD) as a scalable resource, creating a task-specific pipeline for its generation. Our research unequivocally shows that while LLM-generated ALD cannot fully substitute for manual annotations, it is a valuable resource for improving generalization, successfully providing models with the broader coverage and structural knowledge needed to approach recognizing unseen concepts. Code and datasets are available at https://github.com/bio-ie-tool/hi-ald.

2025

pdf bib abs

BiMax: Bidirectional MaxSim Score for Document-Level Alignment
Xiaotian Wang | Takehito Utsuro | Masaaki Nagata
Findings of the Association for Computational Linguistics: EMNLP 2025

Document alignment is necessary for the hierarchical mining, which aligns documents across source and target languages within the same web domain. Several high-precision sentence embedding-based methods have been developed, such as TK-PERT and Optimal Transport (OT). However, given the massive scale of web mining data, both accuracy and speed must be considered.In this paper, we propose a cross-lingual Bidirectional Maxsim score (BiMax) for computing doc-to-doc similarity,to improve efficiency compared to the OT method.Consequently, on the WMT16 bilingual document alignment task,BiMax attains accuracy comparable to OT with an approximate 100-fold speed increase.Meanwhile, we also conduct a comprehensive analysis to investigate the performance of current state-of-the-art multilingual sentence embedding models.

pdf bib abs

RAG based Question Answering of Korean Laws and Precedents
Kiho Seo | Takehito Utsuro
Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER)

We propose a method of improving the performance of question answering based on the interpretation of criminal law regulations in the Korean language by using large language models. In this study, we develop a system that accumulates legislative texts and case precedents related to criminal procedures published on the Internet.The system searches for relevant legal provisions and precedents related to the queryunder the RAG (Retrieval-Augmented Generation) framework.It generates accurate responses to questions by conducting reasoning through large language modelsbased on these relevant laws and precedents. As an application example of this system, it can be utilized to support decision makingin investigations and legal interpretation scenarios within the field of Korean criminal law.

pdf bib abs

Improving Japanese-English Patent Claim Translation with Clause Segmentation Models based on Word Alignment
Masato Nishimura | Kosei Buma | Takehito Utsuro | Masaaki Nagata
Proceedings of Machine Translation Summit XX: Volume 1

In patent documents, patent claims represent a particularly important section as they define the scope of the claims. However, due to the length and unique formatting of these sentences, neural machine translation (NMT) systems are prone to translation errors, such as omissions and repetitions. To address these challenges, this study proposes a translation method that first segments the source sentences into multiple shorter clauses using a clause segmentation model tailored to facilitate translation. These segmented clauses are then translated using a clause translation model specialized for clause-level translation. Finally, the translated clauses are rearranged and edited into the final translation using a reordering and editing model. In addition, this study proposes a method for constructing clause-level parallel corpora required for training the clause segmentation and clause translation models. This method leverages word alignment tools to create clause-level data from sentence-level parallel corpora. Experimental results demonstrate that the proposed method achieves statistically significant improvements in BLEU scores compared to conventional NMT models. Furthermore, for sentences where conventional NMT models exhibit omissions and repetitions, the proposed method effectively suppresses these errors, enabling more accurate translations.

pdf bib abs

This paper presents the submission of NTTSU for the constrained track of the English–Japanese and Japanese–Chinese at the WMT2025 general translation task.For each translation direction, we build translation models from a large language model by combining continual pretraining, supervised fine-tuning, and preference optimization based on the translation quality and adequacy.We finally generate translations via context-aware MBR decoding to maximize translation quality and document-level consistency.

pdf bib abs

A Dialogue System for Semi-Structured Interviews by LLMs and its Evaluation on Persona Information Collection
Ryo Hasegawa | Yijie Hua | Takehito Utsuro | Ekai Hashimoto | Mikio Nakano | Shun Shiramatsu
Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology

In this paper, we propose a dialogue control management framework using large language models for semi-structured interviews. Specifically, large language models are used to generate the interviewer’s utterances and to make conditional branching decisions based on the understanding of the interviewee’s responses. The framework enables flexible dialogue control in interview conversations by generating and updating slots and values according to interviewee answers. More importantly, we invented through LLMs’ prompt tuning the framework of accumulating the list of slots generated along the course of incrementing the number of interviewees through the semi-structured interviews. Evaluation results showed that the proposed approach of accumulating the list of generated slots throughout the semi-structured interviews outperform the baseline without accumulating generated slots in terms of the number of persona attributes and values collected through the semi-structured interview.

pdf bib abs

Generating Financial News Articles from Factors of Stock Price Rise / Decline by LLMs
Shunsuke Nishida | Takehito Utsuro
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

In this paper, we study the task of generating financial news articles related to stock price fluctuations. Traditionally, reporters manually write these articles by identifying the causes behind significant stock price volatility. However, this process is time-consuming, limiting the number of articles produced. To address this, the study explores the use of generative AI to automatically generate such articles. The AI system, similar to human reporters, would analyze stock price volatility and determine the underlying factors contributing to these fluctuations. To support this approach, we introduces a Japanese dataset called JFinSR, which includes stock price fluctuation rankings from “Kabutan” and related financial information regarding factors of stock price rise / decline from “Nihon Keizai Shimbun (Nikkei).” Using this dataset, we implement the few-shot learning technique on large language models (LLMs) to enable automatic generation of high-quality articles from factors of stock price rise / decline that are available in Nikkei. In the evaluation, we compare zero-shot and few-shot learning approaches, where the few-shot learning achieved the higher F1 scores in terms of ROUGE-1/ROUGE-L metrics.

pdf bib abs

Patent Claim Translation via Continual Pre-training of Large Language Models with Parallel Data
Haruto Azami | Minato Kondo | Takehito Utsuro | Masaaki Nagata
Proceedings of Machine Translation Summit XX: Volume 1

Recent advancements in large language models (LLMs) have enabled their application across various domains. However, in the field of patent translation, Transformer encoder-decoder based models remain the standard approach, and the potential of LLMs for translation tasks has not been thoroughly explored. In this study, we conducted patent claim translation using an LLM fine-tuned with parallel data through continual pre-training and supervised fine-tuning, following the methodology proposed by Guo et al. (2024) and Kondo et al. (2024). Comparative evaluation against the Transformer encoder-decoder based translations revealed that the LLM achieved high scores for both BLEU and COMET. This demonstrated improvements in addressing issues such as omissions and repetitions. Nonetheless, hallucination errors, which were not observed in the traditional models, occurred in some cases and negatively affected the translation quality. This study highlights the promise of LLMs for patent translation while identifying the challenges that warrant further investigation.

pdf bib abs

UTSK25 at WAT2025 Patent Claims Translation/Evaluation Task
Haruto Azami | Yin Zhang | Futo Kajita | Nobuyori Nishimura | Takehito Utsuro
Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025)

This paper presents the submission of UTSK25 for the English–Japanese and Japanese–English at the WAT2025 Patent Claims Translation/Evaluation Task. We use a single translation model for both translation directions, built from a large language model through monolingual and bilingual continual pretraining and bilingual supervised fine-tuning. We finally generate translations via prompt engineering to reduce omissions and hallucinations.

pdf bib abs

Two Step Automatic Post Editing of Patent Machine Translation based on Pre-trained Encoder Models and LLMs
Kosei Buma | Takehito Utsuro | Masaaki Nagata
The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

We study automatic post-editing for patent translation, where accuracy and traceability are critical, and propose a two-step pipeline that combines a multilingual encoder for token-level error detection with an LLM for targeted correction. As no word-level annotations exist for Japanese–English patents, we create supervised data by injecting synthetic errors into parallel patent sentences and fine-tune mBERT, XLM-RoBERTa, and mDeBERTa as detectors. In the second stage, GPT-4o is prompted to revise translations either freely or under a restricted policy that allows edits only on detector-marked spans. For error detection, evaluation on synthetic errors shows that encoder-based detectors outperform LLMs in both F1 and MCC. For error correction, tests on synthetic, repetition, and omission datasets demonstrate statistically significant BLEU gains over LLM methods for synthetic and repetition errors, while omission errors remain challenging. Overall, pairing compact encoders with an LLM enables more accurate and controllable post-editing for key patent error types, reducing unnecessary rewrites via restricted edits. Future work will focus on strengthening omission modeling to better detect and correct missing content.

2024

pdf bib abs

Embedded Topic Models Enhanced by Wikification
Takashi Shibuya | Takehito Utsuro
Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia

Topic modeling analyzes a collection of documents to learn meaningful patterns of words.However, previous topic models consider only the spelling of words and do not take into consideration the polysemy of words.In this study, we incorporate the Wikipedia knowledge into a neural topic model to make it aware of named entities.We evaluate our method on two datasets, 1) news articles of New York Times and 2) the AIDA-CoNLL dataset.Our experiments show that our method improves the performance of neural topic models in generalizability.Moreover, we analyze frequent words in each topic and the temporal dependencies between topics to demonstrate that our entity-aware topic models can capture the time-series development of topics well.

pdf bib abs

The NTTSU team’s submission leverages several large language models developed through a training procedure that includes continual pre-training and supervised fine-tuning. For paragraph-level translation, we generated synthetic paragraph-aligned data and utilized this data for training.In the task of translating Japanese to Chinese, we particularly focused on the speech domain translation. Specifically, we built Whisper models for Japanese automatic speech recognition (ASR). We used YODAS dataset for Whisper training. Since this data contained many noisy data pairs, we combined the Whisper outputs using ROVER for polishing the transcriptions. Furthermore, to enhance the robustness of the translation model against errors in the transcriptions, we performed data augmentation by forward translation from audio, using both ASR and base translation models.To select the best translation from multiple hypotheses of the models, we applied Minimum Bayes Risk decoding + reranking, incorporating scores such as COMET-QE, COMET, and cosine similarity by LaBSE.

pdf bib abs

Aggregating Impressions on Celebrities and their Reasons from Microblog Posts and Web Search Pages
Hibiki Yokoyama | Rikuto Tsuchida | Kosei Buma | Sho Miyakawa | Takehito Utsuro | Masaharu Yoshioka
Proceedings of the 3rd Workshop on Knowledge Augmented Methods for NLP

This paper aims to augment fans’ ability to critique and exploreinformation related to celebrities of interest. First, we collect postsfrom X (formerly Twitter) that discuss matters related to specificcelebrities. For the collection of major impressions from these posts,we employ ChatGPT as a large language model (LLM) to analyze andsummarize key sentiments. Next, based on collected impressions, wesearch for Web pages and collect the content of the top 30 ranked pagesas the source for exploring the reasons behind those impressions. Oncethe Web page content collection is complete, we collect and aggregatedetailed reasons for the impressions on the celebrities from the contentof each page. For this part, we continue to use ChatGPT, enhanced bythe retrieval augmented generation (RAG) framework, to ensure thereliability of the collected results compared to relying solely on theprior knowledge of the LLM. Evaluation results by comparing a referencethat is manually collected and aggregated reasons with those predictedby ChatGPT revealed that ChatGPT achieves high accuracy in reasoncollection and aggregation. Furthermore, we compared the performance ofChatGPT with an existing model of mT5 in reason collection and confirmedthat ChatGPT exhibits superior performance.

Takehito Utsuro

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2000

1998

1997

1996

1994

1993

1992

Co-authors

Venues