2025
pdf
bib
abs
ECC: Synergizing Emotion, Cause and Commonsense for Empathetic Dialogue Generation
Xu Wang
|
Bo Wang
|
Yihong Tang
|
Dongming Zhao
|
Jing Liu
|
Ruifang He
|
Yuexian Hou
Proceedings of the 31st International Conference on Computational Linguistics
Empathy improves human-machine dialogue systems by enhancing the user’s experience. While traditional models have aimed to detect and express users’ emotions from dialogue history, they neglect the crucial and complex interactions among emotion, emotion causes, and commonsense. To address this, we introduce the ECC (Emotion, Cause, and Commonsense) framework, which leverages specialized encoders to capture the key features of emotion, cause, and commonsense and collaboratively models these through a Conditional Variational Auto-Encoder. ECC further employs novel loss functions to refine the interplay of three factors and generates empathetic responses using an energy-based model supported by ODE sampling. Empirical results on the EmpatheticDialogues dataset demonstrate that ECC outperforms existing baselines, offering a robust solution for empathetic dialogue generation.
pdf
bib
abs
RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems
Yihong Tang
|
Bo Wang
|
Xu Wang
|
Dongming Zhao
|
Jing Liu
|
Ruifang He
|
Yuexian Hou
Proceedings of the 31st International Conference on Computational Linguistics
Role-playing systems powered by large language models (LLMs) have become increasingly influential in emotional communication applications. However, these systems are susceptible to character hallucinations, where the model deviates from predefined character roles and generates responses that are inconsistent with the intended persona. This paper presents the first systematic analysis of character hallucination from an attack perspective, introducing the RoleBreak framework. Our framework identifies two core mechanisms—query sparsity and role-query conflict—as key factors driving character hallucination. Leveraging these insights, we construct a novel dataset, RoleBreakEval, to evaluate existing hallucination mitigation techniques. Our experiments reveal that even enhanced models trained to minimize hallucination remain vulnerable to attacks. To address these vulnerabilities, we propose a novel defence strategy, the Narrator Mode, which generates supplemental context through narration to mitigate role-query conflicts and improve query generalization. Experimental results demonstrate that Narrator Mode significantly outperforms traditional refusal-based strategies by reducing hallucinations, enhancing fidelity to character roles and queries, and improving overall narrative coherence.
2024
pdf
bib
abs
MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space
Yihong Tang
|
Bo Wang
|
Dongming Zhao
|
Jinxiaojia Jinxiaojia
|
Zhangjijun Zhangjijun
|
Ruifang He
|
Yuexian Hou
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Personalized Dialogue Generation (PDG) aims to create coherent responses according to roles or personas. Traditional PDG relies on external role data, which can be scarce and raise privacy concerns. Approaches address these issues by extracting role information from dialogue history, which often fail to generically model roles in continuous space. To overcome these limitations, we introduce a novel framework Models Roles from Personalized Dialogue History by Exploring and Utilizing Latent Space (MORPHEUS) through a three-stage training process. Specifically, we create a persona codebook to represent roles in latent space compactly, and this codebook is used to construct a posterior distribution of role information. This method enables the model to generalize across roles, allowing the generation of personalized dialogues even for unseen roles. Experiments on both Chinese and English datasets demonstrate that MORPHEUS enhances the extraction of role information, and improves response generation without external role data. Additionally, MORPHEUS can be considered an efficient fine-tuning for large language models.
pdf
bib
abs
ItiNera: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary Planning
Yihong Tang
|
Zhaokai Wang
|
Ao Qu
|
Yihao Yan
|
Zhaofeng Wu
|
Dingyi Zhuang
|
Jushi Kai
|
Kebing Hou
|
Xiaotong Guo
|
Jinhua Zhao
|
Zhan Zhao
|
Wei Ma
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Citywalk, a recently popular form of urban travel, requires genuine personalization and understanding of fine-grained requests compared to traditional itinerary planning. In this paper, we introduce the novel task of Open-domain Urban Itinerary Planning (OUIP), which generates personalized urban itineraries from user requests in natural language. We then present ItiNera, an OUIP system that integrates spatial optimization with large language models to provide customized urban itineraries based on user needs. This involves decomposing user requests, selecting candidate points of interest (POIs), ordering the POIs based on cluster-aware spatial optimization, and generating the itinerary. Experiments on real-world datasets and the performance of the deployed system demonstrate our system’s capacity to deliver personalized and spatially coherent itineraries compared to current solutions. Source codes of ItiNera are available at https://github.com/YihongT/ITINERA.
pdf
bib
abs
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
Jiao Ou
|
Junda Lu
|
Che Liu
|
Yihong Tang
|
Fuzheng Zhang
|
Di Zhang
|
Kun Gai
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large language models (LLMs) have achieved remarkable breakthroughs in new dialogue capabilities by leveraging instruction tuning,which refreshes human impressions of dialogue systems. The long-standing goal of dialogue systems is to be human-like enough to establish long-term connections with users. Therefore, there has been an urgent need to evaluate LLMs as human-like dialogue systems. In this paper, we propose DialogBench, a dialogue evaluation benchmark that contains 12 dialogue tasks to probe the capabilities of LLMs as human-like dialogue systems should have. Specifically, we prompt GPT-4 to generate evaluation instances for each task. We first design the basic prompt based on widely used design principles and further mitigate the existing biases to generate higher-quality evaluation instances. Our extensive tests on English and Chinese DialogBench of 26 LLMs show that instruction tuning improves the human likeness of LLMs to a certain extent, but most LLMs still have much room for improvement as human-like dialogue systems. Interestingly, results also show that the positioning of assistant AI can make instruction tuning weaken the human emotional perception of LLMs and their mastery of information about human daily life.
2023
pdf
bib
abs
Enhancing Personalized Dialogue Generation with Contrastive Latent Variables: Combining Sparse and Dense Persona
Yihong Tang
|
Bo Wang
|
Miao Fang
|
Dongming Zhao
|
Kun Huang
|
Ruifang He
|
Yuexian Hou
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The personalized dialogue explores the consistent relationship between dialogue generation and personality. Existing personalized dialogue agents model persona profiles from three resources: sparse or dense persona descriptions and dialogue histories. However, sparse structured persona attributes are explicit but uninformative, dense persona texts contain rich persona descriptions with much noise, and dialogue history query is both noisy and uninformative for persona modeling. In this work, we combine the advantages of the three resources to obtain a richer and more accurate persona. We design a Contrastive Latent Variable-based model (CLV) that clusters the dense persona descriptions into sparse categories, which are combined with the history query to generate personalized responses. Experimental results on Chinese and English datasets demonstrate our model’s superiority in personalization.