Nobuaki Minematsu

Also published as: N. Minematsu


2026

In academic research, post-presentation Q&A sessions are crucial for deepening understanding and shaping research directions. Supervisors’ comments are particularly valuable when they highlight perspectives that students have not yet fully considered. Such comments typically arise from careful reasoning within dialogue, yet large language models (LLMs) still struggle to reason precisely about dialogue context and communicative intentions. Building on LLMs, this study proposes a feedback generation framework based on the Belief–Desire–Intention (BDI) model, which conceptualizes Q&A sessions as cognitive interactions between presenters and questioners. We further extend this framework into BI-R by introducing Respect as an explicit dimension, ensuring that generated feedback is not only accurate but also pedagogically constructive. We evaluated the proposed framework (BDI and BI-R) through comparative experiments with master’s students and field experiments with doctoral students during pre-defense presentations. Results showed that while the BDI prompt did not outperform the baseline, the BI-R prompt was particularly effective when students did not fully grasp the broader context or background of the questions. When comparing BDI and BI-R, the inclusion of Respect improved the tone and pedagogical appropriateness of feedback. These findings highlight the potential of the proposed framework as a supportive tool for training students and early-career researchers.

2025

We present Re:Member, a system that explores how emotionally expressive, memory-grounded interaction can support more engaging second language (L2) learning. By drawing on users’ personal videos and generating stylized spoken questions in the target language, Re:Member is designed to encourage affective recall and conversational engagement. The system aligns emotional tone with visual context, using expressive speech styles such as whispers or late-night tones to evoke specific moods. It combines WhisperX-based transcript alignment, 3-frame visual sampling, and Style-BERT-VITS2 for emotional synthesis within a modular generation pipeline. Designed as a stylized interaction probe, Re:Member highlights the role of affect and personal media in learner-centered educational technologies.

2022

Language models (LM) have played crucial roles in automatic speech recognition (ASR) to enhance end-to-end (E2E) ASR systems’ performance. There are two categories of approaches: finding better ways to integrate LMs into ASR systems and adapting on LMs to the task domain. This article will start with a reflection of interpolation-based integration methods of E2E ASR’s scores and LM’s scores. Then we will focus on LM augmentation approaches based on the noisy channel model, which is intrigued by insights obtained from the above reflection. The experiments show that we can enhance an ASR E2E model based on encoder-decoder architecture by pre-training the decoder with text data. This implies the decoder of an E2E model can be treated as an LM and reveals the possibility of enhancing the E2E model without an external LM. Based on those ideas, we proposed the implicit language model canceling method and then did more discussion about the decoder part of an E2E ASR model. The experimental results on the TED-LIUM2 dataset show that our approach achieves a 3.4% relative WER reduction compared with the baseline system, and more analytic experiments provide concrete experimental supports for our assumption.

2012

2011

2002

2000