Ao Guo


2025

pdf bib
Exploring Factors Influencing Hospitality in Mobile Robot Guidance: A Wizard-of-Oz Study with a Teleoperated Humanoid Robot
Ao Guo | Shota Mochizuki | Sanae Yamashita | Saya Nikaido | Tomoko Isomura | Ryuichiro Higashinaka
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Developing mobile robots that can provide guidance with high hospitality remains challenging, as it requires the coordination of spoken interaction, physical navigation, and user engagement. To gain insights that contribute to the development of such robots, we conducted a Wizard-of-Oz (WOZ) study using Teleco, a teleoperated humanoid robot, to explore the factors influencing hospitality in mobile robot guidance. Specifically, we enrolled 30 participants as visitors and two trained operators, who teleoperated the Teleco robot to provide mobile guidance to the participants. A total of 120 dialogue sessions were collected, along with evaluations from both the participants and the operators regarding the hospitality of each interaction. To identify the factors that influence hospitality in mobile guidance, we analyzed the collected dialogues from two perspectives: linguistic usage and multimodal robot behaviors. We first clustered system utterances and analyzed the frequency of categories in high- and low-satisfaction dialogues. The results showed that short responses appeared more frequently in high-satisfaction dialogues. Moreover, we observed a general increase in participant satisfaction over successive sessions, along with shifts in linguistic usage, suggesting a mutual adaptation effect between operators and participants. We also conducted a time-series analysis of multimodal robot behaviors to explore behavioral patterns potentially linked to hospitable interactions.

pdf bib
Integrating Physiological, Speech, and Textual Information Toward Real-Time Recognition of Emotional Valence in Dialogue
Jingjing Jiang | Ao Guo | Ryuichiro Higashinaka
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Accurately estimating users’ emotional states in real time is crucial for enabling dialogue systems to respond adaptively. While existing approaches primarily rely on verbal information, such as text and speech, these modalities are often unavailable in non-speaking situations. In such cases, non-verbal information, particularly physiological signals, becomes essential for understanding users’ emotional states. In this study, we aimed to develop a model for real-time recognition of users’ binary emotional valence (high-valence vs. low-valence) during conversations. Specifically, we utilized an existing Japanese multimodal dialogue dataset, which includes various physiological signals, namely electrodermal activity (EDA), blood volume pulse (BVP), photoplethysmography (PPG), and pupil diameter, along with speech and textual data. We classify the emotional valence of every 15-second segment of dialogue interaction by integrating such multimodal inputs. To this end, time-series embeddings of physiological signals are extracted using a self-supervised encoder, while speech and textual features are obtained from pre-trained Japanese HuBERT and BERT models, respectively. The modality-specific embeddings are integrated using a feature fusion mechanism for emotional valence recognition. Experimental results show that while each modality individually contributes to emotion recognition, the inclusion of physiological signals leads to a notable performance improvement, particularly in non-speaking or minimally verbal situations. These findings underscore the importance of physiological information for enhancing real-time valence recognition in dialogue systems, especially when verbal information is limited.

2024

pdf bib
Estimating the Emotional Valence of Interlocutors Using Heterogeneous Sensors in Human-Human Dialogue
Jingjing Jiang | Ao Guo | Ryuichiro Higashinaka
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Dialogue systems need to accurately understand the user’s mental state to generate appropriate responses, but accurately discerning such states solely from text or speech can be challenging. To determine which information is necessary, we first collected human-human multimodal dialogues using heterogeneous sensors, resulting in a dataset containing various types of information including speech, video, physiological signals, gaze, and body movement. Additionally, for each time step of the data, users provided subjective evaluations of their emotional valence while reviewing the dialogue videos. Using this dataset and focusing on physiological signals, we analyzed the relationship between the signals and the subjective evaluations through Granger causality analysis. We also investigated how sensor signals differ depending on the polarity of the valence. Our findings revealed several physiological signals related to the user’s emotional valence.

2023

pdf bib
RealPersonaChat: A Realistic Persona Chat Corpus with Interlocutors’ Own Personalities
Sanae Yamashita | Koji Inoue | Ao Guo | Shota Mochizuki | Tatsuya Kawahara | Ryuichiro Higashinaka
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf bib
Applying Item Response Theory to Task-oriented Dialogue Systems for Accurately Determining User’s Task Success Ability
Ryu Hirai | Ao Guo | Ryuichiro Higashinaka
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

While task-oriented dialogue systems have improved, not all users can fully accomplish their tasks. Users with limited knowledge about the system may experience dialogue breakdowns or fail to achieve their tasks because they do not know how to interact with the system. For addressing this issue, it would be desirable to construct a system that can estimate the user’s task success ability and adapt to that ability. In this study, we propose a method that estimates this ability by applying item response theory (IRT), commonly used in education for estimating examinee abilities, to task-oriented dialogue systems. Through experiments predicting the probability of a correct answer to each slot by using the estimated task success ability, we found that the proposed method significantly outperformed baselines.

2021

pdf bib
Influence of user personality on dialogue task performance: A case study using a rule-based dialogue system
Ao Guo | Atsumoto Ohashi | Ryu Hirai | Yuya Chiba | Yuiko Tsunomori | Ryuichiro Higashinaka
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Endowing a task-oriented dialogue system with adaptiveness to user personality can greatly help improve the performance of a dialogue task. However, such a dialogue system can be practically challenging to implement, because it is unclear how user personality influences dialogue task performance. To explore the relationship between user personality and dialogue task performance, we enrolled participants via crowdsourcing to first answer specified personality questionnaires and then chat with a dialogue system to accomplish assigned tasks. A rule-based dialogue system on the prevalent Multi-Domain Wizard-of-Oz (MultiWOZ) task was used. A total of 211 participants’ personalities and their 633 dialogues were collected and analyzed. The results revealed that sociable and extroverted people tended to fail the task, whereas neurotic people were more likely to succeed. We extracted features related to user dialogue behaviors and performed further analysis to determine which kind of behavior influences task performance. As a result, we identified that average utterance length and slots per utterance are the key features of dialogue behavior that are highly correlated with both task performance and user personality.