Masaki Asada

2026

Principled Self-Correction in Discrete Diffusion: A UCB-Guided Framework for Text Generation
Masaki Asada | Makoto Miwa
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Inspired by their success in image synthesis, diffusion models offer a flexible, iterative alternative to rigid left-to-right text generation. However, a fundamental training-inference discrepancy hinders their performance: models are trained on corrupted ground-truth tokens, but at inference time they must denoise inputs corrupted from their own predictions. To bridge this gap, we propose a unified framework. First, Deeper Self-Prediction (DSP) is a multi-step training objective that teaches robust self-correction by forcing the model to denoise its own intermediate outputs. Second, UCB-guided Decoding is a principled inference algorithm that frames token re-masking as a multi-armed bandit problem, using the Upper Confidence Bound (UCB) to balance exploration and exploitation. Experiments on text generation tasks demonstrate consistent improvements over existing diffusion baselines. The framework achieves higher faithfulness and coherence according to both automatic metrics and LLM-as-a-Judge evaluations.

2025

pdf bib abs

We propose ELAINE (EngLish-jApanese-chINesE)-medLLM, a trilingual (English, Japanese, Chinese) large language model adapted for the bio-medical domain based on Llama-3-8B. The training dataset was carefully curated in terms of volume and diversity to adapt to the biomedical domain and endow trilingual capability while preserving the knowledge and abilities of the base model. The training follows 2-stage paths: continued pre-training and supervised fine-tuning (SFT). Our results demonstrate that ELAINE-medLLM exhibits superior trilingual capabilities compared to existing bilingual or multilingual medical LLMs without severely sacrificing the base model’s capability.

pdf bib abs

ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding
Kimihiro Hasegawa | Wiradee Imrattanatrai | Zhi-Qi Cheng | Masaki Asada | Susan Holm | Yuran Wang | Ken Fukuda | Teruko Mitamura
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Multimodal systems have great potential to assist humans in procedural activities, where people follow instructions to achieve their goals. Despite diverse application scenarios, systems are typically evaluated on traditional classification tasks, e.g., action recognition or temporal action localization. In this paper, we present a novel evaluation dataset, ProMQA, to measure the advancement of systems in application-oriented scenarios. ProMQA consists of 401 multimodal procedural QA pairs on user recording of procedural activities, i.e., cooking, coupled with their corresponding instruction. For QA annotation, we take a cost-effective human-LLM collaborative approach, where the existing annotation is augmented with LLM-generated QA pairs that are later verified by humans. We then provide the benchmark results to set the baseline performance on ProMQA. Our experiment reveals a significant gap between human performance and that of current systems, including competitive proprietary multimodal models. We hope our dataset sheds light on new aspects of models’ multimodal understanding capabilities.

pdf bib abs

Addressing the Training-Inference Discrepancy in Discrete Diffusion for Text Generation
Masaki Asada | Makoto Miwa
Proceedings of the 31st International Conference on Computational Linguistics

This study addresses the discrepancy between training and inference in discrete diffusion models for text generation. We propose two novel strategies: (1) a training schema that considers two-step diffusion processes, allowing the model to use its own predicted output as input for subsequent steps during training and (2) a scheduling technique that gradually increases the probability of using self-generated text as training progresses. Experiments conducted on four widely used text generation benchmark datasets demonstrate that both proposed strategies improve the performance of discrete diffusion models in text generation.

pdf bib abs

Improving Relation Extraction by Sequence-to-sequence-based Dependency Parsing Pre-training
Masaki Asada | Makoto Miwa
Proceedings of the 31st International Conference on Computational Linguistics

Relation extraction is a crucial natural language processing task that extracts relational triplets from raw text. Syntactic dependencies information has shown its effectiveness for relation extraction tasks. However, in most existing studies, dependency information is used only for traditional encoder-only-based relation extraction, not for generative sequence-to-sequence (seq2seq)-based relation extraction. In this study, we propose a syntax-aware seq2seq pre-trained model for seq2seq-based relation extraction. The model incorporates dependency information into a seq2seq pre-trained language model by continual pre-training with a seq2seq-based dependency parsing task. Experimental results on two widely used relation extraction benchmark datasets show that dependency parsing pre-training can improve the relation extraction performance.

2023

pdf bib abs

BioNART: A Biomedical Non-AutoRegressive Transformer for Natural Language Generation
Masaki Asada | Makoto Miwa
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

We propose a novel Biomedical domain-specific Non-AutoRegressive Transformer model for natural language generation: BioNART. Our BioNART is based on an encoder-decoder model, and both encoder and decoder are compatible with widely used BERT architecture, which allows benefiting from publicly available pre-trained biomedical language model checkpoints. We performed additional pre-training and fine-tuned BioNART on biomedical summarization and doctor-patient dialogue tasks. Experimental results show that our BioNART achieves about 94% of the ROUGE score to the pre-trained autoregressive model while realizing an 18 times faster inference speed on the iCliniq dataset.

2018

pdf bib abs

Enhancing Drug-Drug Interaction Extraction from Texts by Molecular Structure Information
Masaki Asada | Makoto Miwa | Yutaka Sasaki
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose a novel neural method to extract drug-drug interactions (DDIs) from texts using external drug molecular structure information. We encode textual drug pairs with convolutional neural networks and their molecular pairs with graph convolutional networks (GCNs), and then we concatenate the outputs of these two networks. In the experiments, we show that GCNs can predict DDIs from the molecular structures of drugs in high accuracy and the molecular information can enhance text-based DDI extraction by 2.39 percent points in the F-score on the DDIExtraction 2013 shared task data set.

2017

pdf bib abs

Extracting Drug-Drug Interactions with Attention CNNs
Masaki Asada | Makoto Miwa | Yutaka Sasaki
Proceedings of the 16th BioNLP Workshop

We propose a novel attention mechanism for a Convolutional Neural Network (CNN)-based Drug-Drug Interaction (DDI) extraction model. CNNs have been shown to have a great potential on DDI extraction tasks; however, attention mechanisms, which emphasize important words in the sentence of a target-entity pair, have not been investigated with the CNNs despite the fact that attention mechanisms are shown to be effective for a general domain relation classification task. We evaluated our model on the Task 9.2 of the DDIExtraction-2013 shared task. As a result, our attention mechanism improved the performance of our base CNN-based DDI model, and the model achieved an F-score of 69.12%, which is competitive with the state-of-the-art models.