2024
pdf
bib
abs
Multilingual Turn-taking Prediction Using Voice Activity Projection
Koji Inoue
|
Bing’er Jiang
|
Erik Ekstedt
|
Tatsuya Kawahara
|
Gabriel Skantze
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data, encompassing English, Mandarin, and Japanese. The VAP model continuously predicts the upcoming voice activities of participants in dyadic dialogue, leveraging a cross-attention Transformer to capture the dynamic interplay between participants. The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages. However, a multilingual model, trained on all three languages, demonstrates predictive performance on par with monolingual models across all languages. Further analyses show that the multilingual model has learned to discern the language of the input signal. We also analyze the sensitivity to pitch, a prosodic cue that is thought to be important for turn-taking. Finally, we compare two different audio encoders, contrastive predictive coding (CPC) pre-trained on English, with a recent model based on multilingual wav2vec 2.0 (MMS).
2023
pdf
bib
abs
Challenges and Approaches in Designing Social SDS in the LLM Era
Koji Inoue
Proceedings of the 19th Annual Meeting of the Young Reseachers' Roundtable on Spoken Dialogue Systems
Large language models (LLMs) have brought about a significant transformation in spoken dialogue systems (SDSs). It is anticipated that these systems will be implemented into diverse robotic applications and employed in a variety of social settings. The author presents research interest with the aim of realizing social SDSs from multiple perspectives, including task design, turn-taking mechanisms, and evaluation methodologies. Additionally, future research in social SDSs should delve into a deeper understanding of user mental states and a relationship with society via multi-party conversations. Finally, the author suggests topics for discussion regarding the future directions of SDS researchers in the LLM era.
pdf
bib
abs
Reasoning before Responding: Integrating Commonsense-based Causality Explanation for Empathetic Response Generation
Yahui Fu
|
Koji Inoue
|
Chenhui Chu
|
Tatsuya Kawahara
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Recent approaches to empathetic response generation try to incorporate commonsense knowledge or reasoning about the causes of emotions to better understand the user’s experiences and feelings. However, these approaches mainly focus on understanding the causalities of context from the user’s perspective, ignoring the system’s perspective. In this paper, we propose a commonsense-based causality explanation approach for diverse empathetic response generation that considers both the user’s perspective (user’s desires and reactions) and the system’s perspective (system’s intentions and reactions). We enhance ChatGPT’s ability to reason for the system’s perspective by integrating in-context learning with commonsense knowledge. Then, we integrate the commonsense-based causality explanation with both ChatGPT and a T5-based model. Experimental evaluations demonstrate that our method outperforms other comparable methods on both automatic and human evaluations.
pdf
bib
RealPersonaChat: A Realistic Persona Chat Corpus with Interlocutors’ Own Personalities
Sanae Yamashita
|
Koji Inoue
|
Ao Guo
|
Shota Mochizuki
|
Tatsuya Kawahara
|
Ryuichiro Higashinaka
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation
2022
pdf
bib
abs
Simultaneous Job Interview System Using Multiple Semi-autonomous Agents
Haruki Kawai
|
Yusuke Muraki
|
Kenta Yamamoto
|
Divesh Lala
|
Koji Inoue
|
Tatsuya Kawahara
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
In recent years, spoken dialogue systems have been applied to job interviews where an applicant talks to a system that asks pre-defined questions, called on-demand and self-paced job interviews. We propose a simultaneous job interview system, where one interviewer can conduct one-on-one interviews with multiple applicants simultaneously by cooperating with the multiple autonomous job interview dialogue systems. However, it is challenging for interviewers to monitor and understand all the parallel interviews done by the autonomous system at the same time. As a solution to this issue, we implemented two automatic dialogue understanding functions: (1) response evaluation of each applicant’s responses and (2) keyword extraction as a summary of the responses. It is expected that interviewers, as needed, can intervene in one dialogue and smoothly ask a proper question that elaborates the interview. We report a pilot experiment where an interviewer conducted simultaneous job interviews with three candidates.
2021
pdf
bib
abs
A multi-party attentive listening robot which stimulates involvement from side participants
Koji Inoue
|
Hiromi Sakamoto
|
Kenta Yamamoto
|
Divesh Lala
|
Tatsuya Kawahara
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
We demonstrate the moderating abilities of a multi-party attentive listening robot system when multiple people are speaking in turns. Our conventional one-on-one attentive listening system generates listener responses such as backchannels, repeats, elaborating questions, and assessments. In this paper, additional robot responses that stimulate a listening user (side participant) to become more involved in the dialogue are proposed. The additional responses elicit assessments and questions from the side participant, making the dialogue more empathetic and lively.
2020
pdf
bib
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Olivier Pietquin
|
Smaranda Muresan
|
Vivian Chen
|
Casey Kennington
|
David Vandyke
|
Nina Dethlefs
|
Koji Inoue
|
Erik Ekstedt
|
Stefan Ultes
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
pdf
bib
abs
An Attentive Listening System with Android ERICA: Comparison of Autonomous and WOZ Interactions
Koji Inoue
|
Divesh Lala
|
Kenta Yamamoto
|
Shizuka Nakamura
|
Katsuya Takanashi
|
Tatsuya Kawahara
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
We describe an attentive listening system for the autonomous android robot ERICA. The proposed system generates several types of listener responses: backchannels, repeats, elaborating questions, assessments, generic sentimental responses, and generic responses. In this paper, we report a subjective experiment with 20 elderly people. First, we evaluated each system utterance excluding backchannels and generic responses, in an offline manner. It was found that most of the system utterances were linguistically appropriate, and they elicited positive reactions from the subjects. Furthermore, 58.2% of the responses were acknowledged as being appropriate listener responses. We also compared the proposed system with a WOZ system where a human operator was operating the robot. From the subjective evaluation, the proposed system achieved comparable scores in basic skills of attentive listening such as encouragement to talk, focused on the talk, and actively listening. It was also found that there is still a gap between the system and the WOZ for more sophisticated skills such as dialogue understanding, showing interest, and empathy towards the user.
2017
pdf
bib
abs
Attentive listening system with backchanneling, response generation and flexible turn-taking
Divesh Lala
|
Pierrick Milhorat
|
Koji Inoue
|
Masanari Ishida
|
Katsuya Takanashi
|
Tatsuya Kawahara
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Attentive listening systems are designed to let people, especially senior people, keep talking to maintain communication ability and mental health. This paper addresses key components of an attentive listening system which encourages users to talk smoothly. First, we introduce continuous prediction of end-of-utterances and generation of backchannels, rather than generating backchannels after end-point detection of utterances. This improves subjective evaluations of backchannels. Second, we propose an effective statement response mechanism which detects focus words and responds in the form of a question or partial repeat. This can be applied to any statement. Moreover, a flexible turn-taking mechanism is designed which uses backchannels or fillers when the turn-switch is ambiguous. These techniques are integrated into a humanoid robot to conduct attentive listening. We test the feasibility of the system in a pilot experiment and show that it can produce coherent dialogues during conversation.
2016
pdf
bib
Talking with ERICA, an autonomous android
Koji Inoue
|
Pierrick Milhorat
|
Divesh Lala
|
Tianyu Zhao
|
Tatsuya Kawahara
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue