Nobuaki Minematsu

Also published as: N. Minematsu

2026

Incorporating Respect into LLM-Based Academic Feedback: A BI-R Framework for Instructing Students after Q&A Sessions
Mayuko Aiba | Daisuke Saito | Nobuaki Minematsu
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology

In academic research, post-presentation Q&A sessions are crucial for deepening understanding and shaping research directions. Supervisors’ comments are particularly valuable when they highlight perspectives that students have not yet fully considered. Such comments typically arise from careful reasoning within dialogue, yet large language models (LLMs) still struggle to reason precisely about dialogue context and communicative intentions. Building on LLMs, this study proposes a feedback generation framework based on the Belief–Desire–Intention (BDI) model, which conceptualizes Q&A sessions as cognitive interactions between presenters and questioners. We further extend this framework into BI-R by introducing Respect as an explicit dimension, ensuring that generated feedback is not only accurate but also pedagogically constructive. We evaluated the proposed framework (BDI and BI-R) through comparative experiments with master’s students and field experiments with doctoral students during pre-defense presentations. Results showed that while the BDI prompt did not outperform the baseline, the BI-R prompt was particularly effective when students did not fully grasp the broader context or background of the questions. When comparing BDI and BI-R, the inclusion of Respect improved the tone and pedagogical appropriateness of feedback. These findings highlight the potential of the proposed framework as a supportive tool for training students and early-career researchers.

2025

pdf bib abs

Re:Member: Emotional Question Generation from Personal Memories
Zackary Rackauckas | Nobuaki Minematsu | Julia Hirschberg
Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)

We present Re:Member, a system that explores how emotionally expressive, memory-grounded interaction can support more engaging second language (L2) learning. By drawing on users’ personal videos and generating stylized spoken questions in the target language, Re:Member is designed to encourage affective recall and conversational engagement. The system aligns emotional tone with visual context, using expressive speech styles such as whispers or late-night tones to evoke specific moods. It combines WhisperX-based transcript alignment, 3-frame visual sampling, and Style-BERT-VITS2 for emotional synthesis within a modular generation pipeline. Designed as a stylized interaction probe, Re:Member highlights the role of affect and personal media in learner-centered educational technologies.

2022

pdf bib abs

Can We Train a Language Model Inside an End-to-End ASR Model? - Investigating Effective Implicit Language Modeling
Zhuo Gong | Daisuke Saito | Sheng Li | Hisashi Kawai | Nobuaki Minematsu
Proceedings of the Second Workshop on When Creative AI Meets Conversational AI

Language models (LM) have played crucial roles in automatic speech recognition (ASR) to enhance end-to-end (E2E) ASR systems’ performance. There are two categories of approaches: finding better ways to integrate LMs into ASR systems and adapting on LMs to the task domain. This article will start with a reflection of interpolation-based integration methods of E2E ASR’s scores and LM’s scores. Then we will focus on LM augmentation approaches based on the noisy channel model, which is intrigued by insights obtained from the above reflection. The experiments show that we can enhance an ASR E2E model based on encoder-decoder architecture by pre-training the decoder with text data. This implies the decoder of an E2E model can be treated as an LM and reveals the possibility of enhancing the E2E model without an external LM. Based on those ideas, we proposed the implicit language model canceling method and then did more discussion about the decoder part of an E2E ASR model. The experimental results on the TED-LIUM2 dataset show that our approach achieves a 3.4% relative WER reduction compared with the baseline system, and more analytic experiments provide concrete experimental supports for our assumption.