Xiubo Geng


pdf bib
ClarET: Pre-training a Correlation-Aware Context-To-Event Transformer for Event-Centric Generation and Classification
Yucheng Zhou | Tao Shen | Xiubo Geng | Guodong Long | Daxin Jiang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Generating new events given context with correlated ones plays a crucial role in many event-centric reasoning tasks. Existing works either limit their scope to specific scenarios or overlook event-level correlations. In this paper, we propose to pre-train a general Correlation-aware context-to-Event Transformer (ClarET) for event-centric reasoning. To achieve this, we propose three novel event-centric objectives, i.e., whole event recovering, contrastive event-correlation encoding and prompt-based event locating, which highlight event-level correlations with effective training. The proposed ClarET is applicable to a wide range of event-centric reasoning scenarios, considering its versatility of (i) event-correlation types (e.g., causal, temporal, contrast), (ii) application formulations (i.e., generation and classification), and (iii) reasoning types (e.g., abductive, counterfactual and ending reasoning). Empirical fine-tuning results, as well as zero- and few-shot learning, on 9 benchmarks (5 generation and 4 classification tasks covering 4 reasoning types with diverse event correlations), verify its effectiveness and generalization ability.

pdf bib
Multimodal Dialogue Response Generation
Qingfeng Sun | Yujing Wang | Can Xu | Kai Zheng | Yaming Yang | Huang Hu | Fei Xu | Jessica Zhang | Xiubo Geng | Daxin Jiang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Responsing with image has been recognized as an important capability for an intelligent conversational agent. Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods. To fill in the gaps, we first present a new task: multimodal dialogue response generation (MDRG) - given the dialogue history, one model needs to generate a text sequence or an image as response. Learning such a MDRG model often requires multimodal dialogues containing both texts and images which are difficult to obtain. Motivated by the challenge in practice, we consider MDRG under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of text-only dialogues and text-image pairs respectively, then the whole parameters can be well fitted using the limited training examples. Extensive experiments demonstrate our method achieves state-of-the-art results in both automatic and human evaluation, and can generate informative text and high-resolution image responses.

pdf bib
PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks
Yufei Wang | Can Xu | Qingfeng Sun | Huang Hu | Chongyang Tao | Xiubo Geng | Daxin Jiang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based Data Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The NLU models can be further improved when they are combined for training.

pdf bib
HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations
Jia-Chen Gu | Chao-Hong Tan | Chongyang Tao | Zhen-Hua Ling | Huang Hu | Xiubo Geng | Daxin Jiang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recently, various response generation models for two-party conversations have achieved impressive improvements, but less effort has been paid to multi-party conversations (MPCs) which are more practical and complicated. Compared with a two-party conversation where a dialogue context is a sequence of utterances, building a response generation model for MPCs is more challenging, since there exist complicated context structures and the generated responses heavily rely on both interlocutors (i.e., speaker and addressee) and history utterances. To address these challenges, we present HeterMPC, a heterogeneous graph-based neural network for response generation in MPCs which models the semantics of utterances and interlocutors simultaneously with two types of nodes in a graph. Besides, we also design six types of meta relations with node-edge-type-dependent parameters to characterize the heterogeneous interactions within the graph. Through multi-hop updating, HeterMPC can adequately utilize the structural knowledge of conversations for response generation. Experimental results on the Ubuntu Internet Relay Chat (IRC) channel benchmark show that HeterMPC outperforms various baseline models for response generation in MPCs.

pdf bib
TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge
Chao-Hong Tan | Jia-Chen Gu | Chongyang Tao | Zhen-Hua Ling | Can Xu | Huang Hu | Xiubo Geng | Daxin Jiang
Findings of the Association for Computational Linguistics: ACL 2022

Generating natural and informative texts has been a long-standing problem in NLP. Much effort has been dedicated into incorporating pre-trained language models (PLMs) with various open-world knowledge, such as knowledge graphs or wiki pages. However, their ability to access and manipulate the task-specific knowledge is still limited on downstream tasks, as this type of knowledge is usually not well covered in PLMs and is hard to acquire. To address the problem, we propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework. Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively on the basis of PLMs. With the help of these two types of knowledge, our model can learn what and how to generate. Experiments on two text generation tasks of dialogue generation and question generation, and on two datasets show that our method achieves better performance than various baseline models.


pdf bib
Improving Zero-Shot Cross-lingual Transfer for Multilingual Question Answering over Knowledge Graph
Yucheng Zhou | Xiubo Geng | Tao Shen | Wenqiang Zhang | Daxin Jiang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Multilingual question answering over knowledge graph (KGQA) aims to derive answers from a knowledge graph (KG) for questions in multiple languages. To be widely applicable, we focus on its zero-shot transfer setting. That is, we can only access training data in a high-resource language, while need to answer multilingual questions without any labeled data in target languages. A straightforward approach is resorting to pre-trained multilingual models (e.g., mBERT) for cross-lingual transfer, but there is a still significant gap of KGQA performance between source and target languages. In this paper, we exploit unsupervised bilingual lexicon induction (BLI) to map training questions in source language into those in target language as augmented training data, which circumvents language inconsistency between training and inference. Furthermore, we propose an adversarial learning strategy to alleviate syntax-disorder of the augmented data, making the model incline to both language- and syntax-independence. Consequently, our model narrows the gap in zero-shot cross-lingual transfer. Experiments on two multilingual KGQA datasets with 11 zero-resource languages verify its effectiveness.

pdf bib
Modeling Event-Pair Relations in External Knowledge Graphs for Script Reasoning
Yucheng Zhou | Xiubo Geng | Tao Shen | Jian Pei | Wenqiang Zhang | Daxin Jiang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding
Jia-Chen Gu | Chongyang Tao | Zhenhua Ling | Can Xu | Xiubo Geng | Daxin Jiang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recently, various neural models for multi-party conversation (MPC) have achieved impressive improvements on a variety of tasks such as addressee recognition, speaker identification and response prediction. However, these existing methods on MPC usually represent interlocutors and utterances individually and ignore the inherent complicated structure in MPC which may provide crucial interlocutor and utterance semantics and would enhance the conversation understanding process. To this end, we present MPC-BERT, a pre-trained model for MPC understanding that considers learning who says what to whom in a unified model with several elaborated self-supervised tasks. Particularly, these tasks can be generally categorized into (1) interlocutor structure modeling including reply-to utterance recognition, identical speaker searching and pointer consistency distinction, and (2) utterance semantics modeling including masked shared utterance restoration and shared node detection. We evaluate MPC-BERT on three downstream tasks including addressee recognition, speaker identification and response selection. Experimental results show that MPC-BERT outperforms previous methods by large margins and achieves new state-of-the-art performance on all three downstream tasks at two benchmarks.

pdf bib
Reasoning over Entity-Action-Location Graph for Procedural Text Understanding
Hao Huang | Xiubo Geng | Jian Pei | Guodong Long | Daxin Jiang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Procedural text understanding aims at tracking the states (e.g., create, move, destroy) and locations of the entities mentioned in a given paragraph. To effectively track the states and locations, it is essential to capture the rich semantic relations between entities, actions, and locations in the paragraph. Although recent works have achieved substantial progress, most of them focus on leveraging the inherent constraints or incorporating external knowledge for state prediction. The rich semantic relations in the given paragraph are largely overlooked. In this paper, we propose a novel approach (REAL) to procedural text understanding, where we build a general framework to systematically model the entity-entity, entity-action, and entity-location relations using a graph neural network. We further develop algorithms for graph construction, representation learning, and state and location tracking. We evaluate the proposed approach on two benchmark datasets, ProPara, and Recipes. The experimental results show that our method outperforms strong baselines by a large margin, i.e., 5.0% on ProPara and 3.2% on Recipes, illustrating the utility of semantic relations and the effectiveness of the graph-based reasoning model.

pdf bib
Maria: A Visual Experience Powered Conversational Agent
Zujie Liang | Huang Hu | Can Xu | Chongyang Tao | Xiubo Geng | Yining Chen | Fan Liang | Daxin Jiang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Arguably, the visual perception of conversational agents to the physical world is a key way for them to exhibit the human-like intelligence. Image-grounded conversation is thus proposed to address this challenge. Existing works focus on exploring the multimodal dialog models that ground the conversation on a given image. In this paper, we take a step further to study image-grounded conversation under a fully open-ended setting where no paired dialog and image are assumed available. Specifically, we present Maria, a neural conversation agent powered by the visual world experiences which are retrieved from a large-scale image index. Maria consists of three flexible components, i.e., text-to-image retriever, visual concept detector and visual-knowledge-grounded response generator. The retriever aims to retrieve a correlated image to the dialog from an image index, while the visual concept detector extracts rich visual knowledge from the image. Then, the response generator is grounded on the extracted visual knowledge and dialog context to generate the target response. Extensive experiments demonstrate Maria outperforms previous state-of-the-art methods on automatic metrics and human evaluation, and can generate informative responses that have some visual commonsense of the physical world.

pdf bib
Learning Neural Templates for Recommender Dialogue System
Zujie Liang | Huang Hu | Can Xu | Jian Miao | Yingying He | Yining Chen | Xiubo Geng | Fan Liang | Daxin Jiang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The task of Conversational Recommendation System (CRS), i.e., recommender dialog system, aims to recommend precise items to users through natural language interactions. Though recent end-to-end neural models have shown promising progress on this task, two key challenges still remain. First, the recommended items cannot be always incorporated into the generated response precisely and appropriately. Second, only the items mentioned in the training corpus have a chance to be recommended in the conversation. To tackle these challenges, we introduce a novel framework called NTRD for recommender dialogue system that can decouple the dialogue generation from the item recommendation. NTRD has two key components, i.e., response template generator and item selector. The former adopts an encoder-decoder model to generate a response template with slot locations tied to target items, while the latter fills in slot locations with the proper items using a sufficient attention mechanism. Our approach combines the strengths of both classical slot filling approaches (that are generally controllable) and modern neural NLG approaches (that are generally more natural and accurate). Extensive experiments on the benchmark ReDial show our approach significantly outperforms the previous state-of-the-art methods. Besides, our approach has the unique advantage to produce novel items that do not appear in the training set of dialogue corpus. The code is available at https://github.com/jokieleung/NTRD.


pdf bib
Towards Interpretable Reasoning over Paragraph Effects in Situation
Mucheng Ren | Xiubo Geng | Tao Qin | Heyan Huang | Daxin Jiang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect described in a background paragraph, and apply the knowledge to a novel situation. Existing works ignore the complicated reasoning process and solve it with a one-step “black box” model. Inspired by human cognitive processes, in this paper we propose a sequential approach for this task which explicitly models each step of the reasoning process with neural network modules. In particular, five reasoning modules are designed and learned in an end-to-end manner, which leads to a more interpretable model. Experimental results on the ROPES dataset demonstrate the effectiveness and explainability of our proposed approach.


pdf bib
Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base
Tao Shen | Xiubo Geng | Tao Qin | Daya Guo | Duyu Tang | Nan Duan | Guodong Long | Daxin Jiang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We consider the problem of conversational question answering over a large-scale knowledge base. To handle huge entity vocabulary of a large-scale knowledge base, recent neural semantic parsing based approaches usually decompose the task into several subtasks and then solve them sequentially, which leads to following issues: 1) errors in earlier subtasks will be propagated and negatively affect downstream ones; and 2) each subtask cannot naturally share supervision signals with others. To tackle these issues, we propose an innovative multi-task learning framework where a pointer-equipped semantic parsing model is designed to resolve coreference in conversations, and naturally empower joint learning with a novel type-aware entity detection model. The proposed framework thus enables shared supervisions and alleviates the effect of error propagation. Experiments on a large-scale conversational question answering dataset containing 1.6M question answering pairs over 12.8M entities show that the proposed framework improves overall F1 score from 67% to 79% compared with previous state-of-the-art work.