Ryu Takeda


2024

pdf bib
Collecting Human-Agent Dialogue Dataset with Frontal Brain Signal toward Capturing Unexpressed Sentiment
Shun Katada | Ryu Takeda | Kazunori Komatani
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multimodal information such as text and audiovisual data has been used for emotion/sentiment estimation during human-agent dialogue; however, user sentiments are not necessarily expressed explicitly during dialogues. Biosignals such as brain signals recorded using an electroencephalogram (EEG) sensor have been the subject of focus in affective computing regions to capture unexpressed emotional changes in a controlled experimental environment. In this study, we collect and analyze multimodal data with an EEG during a human-agent dialogue toward capturing unexpressed sentiment. Our contributions are as follows: (1) a new multimodal human-agent dialogue dataset is created, which includes not only text and audiovisual data but also frontal EEGs and physiological signals during the dialogue. In total, about 500-minute chat dialogues were collected from thirty participants aged 20 to 70. (2) We present a novel method for dealing with eye-blink noise for frontal EEGs denoising. This method applies facial landmark tracking to detect and delete eye-blink noise. (3) An experimental evaluation showed the effectiveness of the frontal EEGs. It improved sentiment estimation performance when used with other modalities by multimodal fusion, although it only has three channels.

2023

pdf bib
Analyzing Differences in Subjective Annotations by Participants and Third-party Annotators in Multimodal Dialogue Corpus
Kazunori Komatani | Ryu Takeda | Shogo Okada
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Estimating the subjective impressions of human users during a dialogue is necessary when constructing a dialogue system that can respond adaptively to their emotional states. However, such subjective impressions (e.g., how much the user enjoys the dialogue) are inherently ambiguous, and the annotation results provided by multiple annotators do not always agree because they depend on the subjectivity of the annotators. In this paper, we analyzed the annotation results using 13,226 exchanges from 155 participants in a multimodal dialogue corpus called Hazumi that we had constructed, where each exchange was annotated by five third-party annotators. We investigated the agreement between the subjective annotations given by the third-party annotators and the participants themselves, on both per-exchange annotations (i.e., participant’s sentiments) and per-dialogue (-participant) annotations (i.e., questionnaires on rapport and personality traits). We also investigated the conditions under which the annotation results are reliable. Our findings demonstrate that the dispersion of third-party sentiment annotations correlates with agreeableness of the participants, one of the Big Five personality traits.

2017

pdf bib
Unsupervised Segmentation of Phoneme Sequences based on Pitman-Yor Semi-Markov Model using Phoneme Length Context
Ryu Takeda | Kazunori Komatani
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Unsupervised segmentation of phoneme sequences is an essential process to obtain unknown words during spoken dialogues. In this segmentation, an input phoneme sequence without delimiters is converted into segmented sub-sequences corresponding to words. The Pitman-Yor semi-Markov model (PYSMM) is promising for this problem, but its performance degrades when it is applied to phoneme-level word segmentation. This is because of insufficient cues for the segmentation, e.g., homophones are improperly treated as single entries and their different contexts are also confused. We propose a phoneme-length context model for PYSMM to give a helpful cue at the phoneme-level and to predict succeeding segments more accurately. Our experiments showed that the peak performance with our context model outperformed those without such a context model by 0.045 at most in terms of F-measures of estimated segmentation.

pdf bib
Lexical Acquisition through Implicit Confirmations over Multiple Dialogues
Kohei Ono | Ryu Takeda | Eric Nichols | Mikio Nakano | Kazunori Komatani
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

We address the problem of acquiring the ontological categories of unknown terms through implicit confirmation in dialogues. We develop an approach that makes implicit confirmation requests with an unknown term’s predicted category. Our approach does not degrade user experience with repetitive explicit confirmations, but the system has difficulty determining if information in the confirmation request can be correctly acquired. To overcome this challenge, we propose a method for determining whether or not the predicted category is correct, which is included in an implicit confirmation request. Our method exploits multiple user responses to implicit confirmation requests containing the same ontological category. Experimental results revealed that the proposed method exhibited a higher precision rate for determining the correctly predicted categories than when only single user responses were considered.

2016

pdf bib
Bayesian Language Model based on Mixture of Segmental Contexts for Spontaneous Utterances with Unexpected Words
Ryu Takeda | Kazunori Komatani
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper describes a Bayesian language model for predicting spontaneous utterances. People sometimes say unexpected words, such as fillers or hesitations, that cause the miss-prediction of words in normal N-gram models. Our proposed model considers mixtures of possible segmental contexts, that is, a kind of context-word selection. It can reduce negative effects caused by unexpected words because it represents conditional occurrence probabilities of a word as weighted mixtures of possible segmental contexts. The tuning of mixture weights is the key issue in this approach as the segment patterns becomes numerous, thus we resolve it by using Bayesian model. The generative process is achieved by combining the stick-breaking process and the process used in the variable order Pitman-Yor language model. Experimental evaluations revealed that our model outperformed contiguous N-gram models in terms of perplexity for noisy text including hesitations.