Caixia Yuan


pdf bib
Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark
Yuxing Long | Binyuan Hui | Caixia Yuan | Fei Huang | Yongbin Li | Xiaojie Wang
Findings of the Association for Computational Linguistics: ACL 2023

Existing multimodal task-oriented dialog data fails to demonstrate the diverse expressions of user subjective preferences and recommendation acts in the real-life shopping scenario. This paper introduces a new dataset SURE (Multimodal Recommendation Dialog with Subjective Preference), which contains 12K shopping dialogs in complex store scenes. The data is built in two phases with human annotations to ensure quality and diversity. SURE is well-annotated with subjective preferences and recommendation acts proposed by sales experts. A comprehensive analysis is given to reveal the distinguishing features of SURE. Three benchmark tasks are then proposed on the data to evaluate the capability of multimodal recommendation agents. Basing on the SURE, we propose a baseline model, powered by a state-of-the-art multimodal model, for these tasks.

pdf bib
Explicit Alignment and Many-to-many Entailment Based Reasoning for Conversational Machine Reading
Yangyang Luo | Shiyu Tian | Caixia Yuan | Xiaojie Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

Conversational Machine Reading (CMR) requires answering a user’s initial question through multi-turn dialogue interactions based on a given document. Although there exist many effective methods, they largely neglected the alignment between the document and the user-provided information, which significantly affects the intermediate decision-making and subsequent follow-up question generation. To address this issue, we propose a pipeline framework that (1) aligns the aforementioned two sides in an explicit way, (2) makes decisions using a lightweight many-to-many entailment reasoning module, and (3) directly generates follow-up questions based on the document and previously asked questions. Our proposed method achieves state-of-the-art in micro-accuracy and ranks the first place on the public leaderboard of the CMR benchmark dataset ShARC.

pdf bib
Improving Situated Conversational Agents with Step-by-Step Multi-modal Logic Reasoning
Yuxing Long | Huibin Zhang | Binyuan Hui | Zhenglu Yang | Caixia Yuan | Xiaojie Wang | Fei Huang | Yongbin Li
Proceedings of The Eleventh Dialog System Technology Challenge

To fulfill complex user requirements in a situated conversational scenario, the agent needs to conduct step-by-step multi-modal logic reasoning, which includes locating objects, querying information and searching objects. However, existing methods omit this multi-step procedure and therefore constitutes the risk of shortcuts when making predictions. For example, they may directly copy the information from the dialogue history or simply use the textual description without perform visual reasoning. To address this issue and further boost the system performance, we apply the dual process theory to plug a reasoner into the original transformer based model for step-by-step reasoning. When system 2 completes multi-step reasoning, its output is regarded as final prediction. Our proposed method achieved the 1st rank on the summing scores across all four DSTC-11 SIMMC 2.1 sub-tasks.


pdf bib
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
Song Feng | Hui Wan | Caixia Yuan | Han Yu
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

pdf bib
Learn to Adapt for Generalized Zero-Shot Text Classification
Yiwen Zhang | Caixia Yuan | Xiaojie Wang | Ziwei Bai | Yongbin Liu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Generalized zero-shot text classification aims to classify textual instances from both previously seen classes and incrementally emerging unseen classes. Most existing methods generalize poorly since the learned parameters are only optimal for seen classes rather than for both classes, and the parameters keep stationary in predicting procedures. To address these challenges, we propose a novel Learn to Adapt (LTA) network using a variant meta-learning framework. Specifically, LTA trains an adaptive classifier by using both seen and virtual unseen classes to simulate a generalized zero-shot learning (GZSL) scenario in accordance with the test time, and simultaneously learns to calibrate the class prototypes and sample representations to make the learned parameters adaptive to incoming unseen classes. We claim that the proposed model is capable of representing all prototypes and samples from both classes to a more consistent distribution in a global space. Extensive experiments on five text classification datasets show that our model outperforms several competitive previous approaches by large margins. The code and the whole datasets are available at

pdf bib
A Slot Is Not Built in One Utterance: Spoken Language Dialogs with Sub-Slots
Sai Zhang | Yuwei Hu | Yuchuan Wu | Jiaman Wu | Yongbin Li | Jian Sun | Caixia Yuan | Xiaojie Wang
Findings of the Association for Computational Linguistics: ACL 2022

A slot value might be provided segment by segment over multiple-turn interactions in a dialog, especially for some important information such as phone numbers and names. It is a common phenomenon in daily life, but little attention has been paid to it in previous work. To fill the gap, this paper defines a new task named Sub-Slot based Task-Oriented Dialog (SSTOD) and builds a Chinese dialog dataset SSD for boosting research on SSTOD. The dataset includes a total of 40K dialogs and 500K utterances from four different domains: Chinese names, phone numbers, ID numbers and license plate numbers. The data is well annotated with sub-slot values, slot values, dialog states and actions. We find some new linguistic phenomena and interactive manners in SSTOD which raise critical challenges of building dialog agents for the task. We test three state-of-the-art dialog models on SSTOD and find they cannot handle the task well on any of the four domains. We also investigate an improved model by involving slot knowledge in a plug-in manner. More work should be done to meet the new challenges raised from SSTOD which widely exists in real-life applications. The dataset and code are publicly available via


pdf bib
Slot Transferability for Cross-domain Slot Filling
Hengtong Lu | Zhuoxin Han | Caixia Yuan | Xiaojie Wang | Shuyu Lei | Huixing Jiang | Wei Wu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Topic-Aware Contrastive Learning for Abstractive Dialogue Summarization
Junpeng Liu | Yanyan Zou | Hainan Zhang | Hongshen Chen | Zhuoye Ding | Caixia Yuan | Xiaojie Wang
Findings of the Association for Computational Linguistics: EMNLP 2021

Unlike well-structured text, such as news reports and encyclopedia articles, dialogue content often comes from two or more interlocutors, exchanging information with each other. In such a scenario, the topic of a conversation can vary upon progression and the key information for a certain topic is often scattered across multiple utterances of different speakers, which poses challenges to abstractly summarize dialogues. To capture the various topic information of a conversation and outline salient facts for the captured topics, this work proposes two topic-aware contrastive learning objectives, namely coherence detection and sub-summary generation objectives, which are expected to implicitly model the topic change and handle information scattering challenges for the dialogue summarization task. The proposed contrastive objectives are framed as auxiliary tasks for the primary dialogue summarization task, united via an alternative parameter updating strategy. Extensive experiments on benchmark datasets demonstrate that the proposed simple method significantly outperforms strong baselines and achieves new state-of-the-art performance. The code and trained models are publicly available via .

pdf bib
Task-Oriented Clustering for Dialogues
Chenxu Lv | Hengtong Lu | Shuyu Lei | Huixing Jiang | Wei Wu | Caixia Yuan | Xiaojie Wang
Findings of the Association for Computational Linguistics: EMNLP 2021

A reliable clustering algorithm for task-oriented dialogues can help developer analysis and define dialogue tasks efficiently. It is challenging to directly apply prior normal text clustering algorithms for task-oriented dialogues, due to the inherent differences between them, such as coreference, omission and diversity expression. In this paper, we propose a Dialogue Task Clustering Network model for task-oriented clustering. The proposed model combines context-aware utterance representations and cross-dialogue utterance cluster representations for task-oriented dialogues clustering. An iterative end-to-end training strategy is utilized for dialogue clustering and representation learning jointly. Experiments on three public datasets show that our model significantly outperform strong baselines in all metrics.


pdf bib
MrMep: Joint Extraction of Multiple Relations and Multiple Entity Pairs Based on Triplet Attention
Jiayu Chen | Caixia Yuan | Xiaojie Wang | Ziwei Bai
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

This paper focuses on how to extract multiple relational facts from unstructured text. Neural encoder-decoder models have provided a viable new approach for jointly extracting relations and entity pairs. However, these models either fail to deal with entity overlapping among relational facts, or neglect to produce the whole entity pairs. In this work, we propose a novel architecture that augments the encoder and decoder in two elegant ways. First, we apply a binary CNN classifier for each relation, which identifies all possible relations maintained in the text, while retaining the target relation representation to aid entity pair recognition. Second, we perform a multi-head attention over the text and a triplet attention with the target relation interacting with every token of the text to precisely produce all possible entity pairs in a sequential manner. Experiments on three benchmark datasets show that our proposed method successfully addresses the multiple relations and multiple entity pairs even with complex overlapping and significantly outperforms the state-of-the-art methods.


pdf bib
Response Generation in Dialogue Using a Tailored PCFG Parser
Caixia Yuan | Xiaojie Wang | Qianhui He
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)


pdf bib
Leveraging Rich Linguistic Features for Cross-domain Chinese Segmentation
Guohua Wu | Dezhu He | Keli Zhong | Xue Zhou | Caixia Yuan
Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing


pdf bib
Cascaded Chinese Weibo Segmentation Based on CRFs
Keli Zhong | Xue Zhou | Hangyu Li | Caixia Yuan
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing


pdf bib
A Chinese LPCFG Parser with Hybrid Character Information
Wenzhi Xu | Chaobo Sun | Caixia Yuan
CIPS-SIGHAN Joint Conference on Chinese Language Processing


pdf bib
Accurate Learning for Chinese Function Tags from Minimal Features
Caixia Yuan | Fuji Ren | Xiaojie Wang
Proceedings of the ACL-IJCNLP 2009 Student Research Workshop


pdf bib
BUPT Systems in the SIGHAN Bakeoff 2007
Ying Qin | Caixia Yuan | Jiashen Sun | Xiaojie Wang
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing