Ying Shen


pdf bib
HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression
Chenhe Dong | Yaliang Li | Ying Shen | Minghui Qiu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

On many natural language processing tasks, large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. Nevertheless, their huge model size and low inference speed have hindered the deployment on resource-limited devices in practice. In this paper, we target to compress PLMs with knowledge distillation, and propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information. Specifically, to enhance the model capability and transferability, we leverage the idea of meta-learning and set up domain-relational graphs to capture the relational information across different domains. And to dynamically select the most representative prototypes for each domain, we propose a hierarchical compare-aggregate mechanism to capture hierarchical relationships. Extensive experiments on public multi-domain datasets demonstrate the superior performance of our HRKD method as well as its strong few-shot learning ability. For reproducibility, we release the code at https://github.com/cheneydon/hrkd.

pdf bib
Wasserstein Selective Transfer Learning for Cross-domain Text Mining
Lingyun Feng | Minghui Qiu | Yaliang Li | Haitao Zheng | Ying Shen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Transfer learning (TL) seeks to improve the learning of a data-scarce target domain by using information from source domains. However, the source and target domains usually have different data distributions, which may lead to negative transfer. To alleviate this issue, we propose a Wasserstein Selective Transfer Learning (WSTL) method. Specifically, the proposed method considers a reinforced selector to select helpful data for transfer learning. We further use a Wasserstein-based discriminator to maximize the empirical distance between the selected source data and target data. The TL module is then trained to minimize the estimated Wasserstein distance in an adversarial manner and provides domain invariant features for the reinforced selector. We adopt an evaluation metric based on the performance of the TL module as delayed reward and a Wasserstein-based metric as immediate rewards to guide the reinforced selector learning. Compared with the competing TL approaches, the proposed method selects data samples that are closer to the target domain. It also provides better state features and reward signals that lead to better performance with faster convergence. Extensive experiments on three real-world text mining tasks demonstrate the effectiveness of the proposed method.

pdf bib
Continual Learning for Task-oriented Dialogue System with Iterative Network Pruning, Expanding and Masking
Binzong Geng | Fajie Yuan | Qiancheng Xu | Ying Shen | Ruifeng Xu | Min Yang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This ability to learn consecutive tasks without forgetting how to perform previously trained problems is essential for developing an online dialogue system. This paper proposes an effective continual learning method for the task-oriented dialogue system with iterative network pruning, expanding, and masking (TPEM), which preserves performance on previously encountered tasks while accelerating learning progress on subsequent tasks. Specifically, TPEM (i) leverages network pruning to keep the knowledge for old tasks, (ii) adopts network expanding to create free weights for new tasks, and (iii) introduces task-specific network masking to alleviate the negative impact of fixed weights of old tasks on new tasks. We conduct extensive experiments on seven different tasks from three benchmark datasets and show empirically that TPEM leads to significantly improved results over the strong competitors.


pdf bib
Relabel the Noise: Joint Extraction of Entities and Relations via Cooperative Multiagents
Daoyuan Chen | Yaliang Li | Kai Lei | Ying Shen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Distant supervision based methods for entity and relation extraction have received increasing popularity due to the fact that these methods require light human annotation efforts. In this paper, we consider the problem of shifted label distribution, which is caused by the inconsistency between the noisy-labeled training set subject to external knowledge graph and the human-annotated test set, and exacerbated by the pipelined entity-then-relation extraction manner with noise propagation. We propose a joint extraction approach to address this problem by re-labeling noisy instances with a group of cooperative multiagents. To handle noisy instances in a fine-grained manner, each agent in the cooperative group evaluates the instance by calculating a continuous confidence score from its own perspective; To leverage the correlations between these two extraction tasks, a confidence consensus module is designed to gather the wisdom of all agents and re-distribute the noisy training set with confidence-scored labels. Further, the confidences are used to adjust the training losses of extractors. Experimental results on two real-world datasets verify the benefits of re-labeling noisy instance, and show that the proposed model significantly outperforms the state-of-the-art entity and relation extraction methods.

pdf bib
Summarize before Aggregate: A Global-to-local Heterogeneous Graph Inference Network for Conversational Emotion Recognition
Dongming Sheng | Dong Wang | Ying Shen | Haitao Zheng | Haozhuang Liu
Proceedings of the 28th International Conference on Computational Linguistics

Conversational Emotion Recognition (CER) is a crucial task in Natural Language Processing (NLP) with wide applications. Prior works in CER generally focus on modeling emotion influences solely with utterance-level features, with little attention paid on phrase-level semantic connection between utterances. Phrases carry sentiments when they are referred to emotional events under certain topics, providing a global semantic connection between utterances throughout the entire conversation. In this work, we propose a two-stage Summarization and Aggregation Graph Inference Network (SumAggGIN), which seamlessly integrates inference for topic-related emotional phrases and local dependency reasoning over neighbouring utterances in a global-to-local fashion. Topic-related emotional phrases, which constitutes the global topic-related emotional connections, are recognized by our proposed heterogeneous Summarization Graph. Local dependencies, which captures short-term emotional effects between neighbouring utterances, are further injected via an Aggregation Graph to distinguish the subtle differences between utterances containing emotional phrases. The two steps of graph inference are tightly-coupled for a comprehensively understanding of emotional fluctuation. Experimental results on three CER benchmark datasets verify the effectiveness of our proposed model, which outperforms the state-of-the-art approaches.

pdf bib
Integrating User History into Heterogeneous Graph for Dialogue Act Recognition
Dong Wang | Ziran Li | Haitao Zheng | Ying Shen
Proceedings of the 28th International Conference on Computational Linguistics

Dialogue Act Recognition (DAR) is a challenging problem in Natural Language Understanding, which aims to attach Dialogue Act (DA) labels to each utterance in a conversation. However, previous studies cannot fully recognize the specific expressions given by users due to the informality and diversity of natural language expressions. To solve this problem, we propose a Heterogeneous User History (HUH) graph convolution network, which utilizes the user’s historical answers grouped by DA labels as additional clues to recognize the DA label of utterances. To handle the noise caused by introducing the user’s historical answers, we design sets of denoising mechanisms, including a History Selection process, a Similarity Re-weighting process, and an Edge Re-weighting process. We evaluate the proposed method on two benchmark datasets MSDialog and MRDA. The experimental results verify the effectiveness of integrating user’s historical answers, and show that our proposed model outperforms the state-of-the-art methods.

pdf bib
Answer-driven Deep Question Generation based on Reinforcement Learning
Liuyin Wang | Zihan Xu | Zibo Lin | Haitao Zheng | Ying Shen
Proceedings of the 28th International Conference on Computational Linguistics

Deep question generation (DQG) aims to generate complex questions through reasoning over multiple documents. The task is challenging and underexplored. Existing methods mainly focus on enhancing document representations, with little attention paid to the answer information, which may result in the generated question not matching the answer type and being answerirrelevant. In this paper, we propose an Answer-driven Deep Question Generation (ADDQG) model based on the encoder-decoder framework. The model makes better use of the target answer as a guidance to facilitate question generation. First, we propose an answer-aware initialization module with a gated connection layer which introduces both document and answer information to the decoder, thus helping to guide the choice of answer-focused question words. Then a semantic-rich fusion attention mechanism is designed to support the decoding process, which integrates the answer with the document representations to promote the proper handling of answer information during generation. Moreover, reinforcement learning is applied to integrate both syntactic and semantic metrics as the reward to enhance the training of the ADDQG. Extensive experiments on the HotpotQA dataset show that ADDQG outperforms state-of-the-art models in both automatic and human evaluations.

pdf bib
Amalgamating Knowledge from Two Teachers for Task-oriented Dialogue System with Adversarial Training
Wanwei He | Min Yang | Rui Yan | Chengming Li | Ying Shen | Ruifeng Xu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The challenge of both achieving task completion by querying the knowledge base and generating human-like responses for task-oriented dialogue systems is attracting increasing research attention. In this paper, we propose a “Two-Teacher One-Student” learning framework (TTOS) for task-oriented dialogue, with the goal of retrieving accurate KB entities and generating human-like responses simultaneously. TTOS amalgamates knowledge from two teacher networks that together provide comprehensive guidance to build a high-quality task-oriented dialogue system (student network). Each teacher network is trained via reinforcement learning with a goal-specific reward, which can be viewed as an expert towards the goal and transfers the professional characteristic to the student network. Instead of adopting the classic student-teacher learning of forcing the output of a student network to exactly mimic the soft targets produced by the teacher networks, we introduce two discriminators as in generative adversarial network (GAN) to transfer knowledge from two teachers to the student. The usage of discriminators relaxes the rigid coupling between the student and teachers. Extensive experiments on two benchmark datasets (i.e., CamRest and In-Car Assistant) demonstrate that TTOS significantly outperforms baseline methods.


pdf bib
Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge
Ziran Li | Ning Ding | Zhiyuan Liu | Haitao Zheng | Ying Shen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Chinese relation extraction is conducted using neural networks with either character-based or word-based inputs, and most existing methods typically suffer from segmentation errors and ambiguity of polysemy. To address the issues, we propose a multi-grained lattice framework (MG lattice) for Chinese relation extraction to take advantage of multi-grained language information and external linguistic knowledge. In this framework, (1) we incorporate word-level information into character sequence inputs so that segmentation errors can be avoided. (2) We also model multiple senses of polysemous words with the help of external linguistic knowledge, so as to alleviate polysemy ambiguity. Experiments on three real-world datasets in distinct domains show consistent and significant superiority and robustness of our model, as compared with other baselines. We will release the source code of this paper in the future.


pdf bib
Efficient Low-rank Multimodal Fusion With Modality-Specific Factors
Zhun Liu | Ying Shen | Varun Bharadhwaj Lakshminarasimhan | Paul Pu Liang | AmirAli Bagher Zadeh | Louis-Philippe Morency
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multimodal research is an emerging field of artificial intelligence, and one of the main research problems in this field is multimodal fusion. The fusion of multimodal data is the process of integrating multiple unimodal representations into one compact multimodal representation. Previous research in this field has exploited the expressiveness of tensors for multimodal representation. However, these methods often suffer from exponential increase in dimensions and in computational complexity introduced by transformation of input into tensor. In this paper, we propose the Low-rank Multimodal Fusion method, which performs multimodal fusion using low-rank tensors to improve efficiency. We evaluate our model on three different tasks: multimodal sentiment analysis, speaker trait analysis, and emotion recognition. Our model achieves competitive results on all these tasks while drastically reducing computational complexity. Additional experiments also show that our model can perform robustly for a wide range of low-rank settings, and is indeed much more efficient in both training and inference compared to other methods that utilize tensor representations.

pdf bib
Cooperative Denoising for Distantly Supervised Relation Extraction
Kai Lei | Daoyuan Chen | Yaliang Li | Nan Du | Min Yang | Wei Fan | Ying Shen
Proceedings of the 27th International Conference on Computational Linguistics

Distantly supervised relation extraction greatly reduces human efforts in extracting relational facts from unstructured texts. However, it suffers from noisy labeling problem, which can degrade its performance. Meanwhile, the useful information expressed in knowledge graph is still underutilized in the state-of-the-art methods for distantly supervised relation extraction. In the light of these challenges, we propose CORD, a novelCOopeRativeDenoising framework, which consists two base networks leveraging text corpus and knowledge graph respectively, and a cooperative module involving their mutual learning by the adaptive bi-directional knowledge distillation and dynamic ensemble with noisy-varying instances. Experimental results on a real-world dataset demonstrate that the proposed method reduces the noisy labels and achieves substantial improvement over the state-of-the-art methods.

pdf bib
Aspect and Sentiment Aware Abstractive Review Summarization
Min Yang | Qiang Qu | Ying Shen | Qiao Liu | Wei Zhao | Jia Zhu
Proceedings of the 27th International Conference on Computational Linguistics

Review text has been widely studied in traditional tasks such as sentiment analysis and aspect extraction. However, to date, no work is towards the abstractive review summarization that is essential for business organizations and individual consumers to make informed decisions. This work takes the lead to study the aspect/sentiment-aware abstractive review summarization by exploring multi-factor attentions. Specifically, we propose an interactive attention mechanism to interactively learns the representations of context words, sentiment words and aspect words within the reviews, acted as an encoder. The learned sentiment and aspect representations are incorporated into the decoder to generate aspect/sentiment-aware review summaries via an attention fusion network. In addition, the abstractive summarizer is jointly trained with the text categorization task, which helps learn a category-specific text encoder, locating salient aspect information and exploring the variations of style and wording of content with respect to different text categories. The experimental results on a real-life dataset demonstrate that our model achieves impressive results compared to other strong competitors.

pdf bib
Knowledge as A Bridge: Improving Cross-domain Answer Selection with External Knowledge
Yang Deng | Ying Shen | Min Yang | Yaliang Li | Nan Du | Wei Fan | Kai Lei
Proceedings of the 27th International Conference on Computational Linguistics

Answer selection is an important but challenging task. Significant progresses have been made in domains where a large amount of labeled training data is available. However, obtaining rich annotated data is a time-consuming and expensive process, creating a substantial barrier for applying answer selection models to a new domain which has limited labeled data. In this paper, we propose Knowledge-aware Attentive Network (KAN), a transfer learning framework for cross-domain answer selection, which uses the knowledge base as a bridge to enable knowledge transfer from the source domain to the target domains. Specifically, we design a knowledge module to integrate the knowledge-based representational learning into answer selection models. The learned knowledge-based representations are shared by source and target domains, which not only leverages large amounts of cross-domain data, but also benefits from a regularization effect that leads to more general representations to help tasks in new domains. To verify the effectiveness of our model, we use SQuAD-T dataset as the source domain and three other datasets (i.e., Yahoo QA, TREC QA and InsuranceQA) as the target domains. The experimental results demonstrate that KAN has remarkable applicability and generality, and consistently outperforms the strong competitors by a noticeable margin for cross-domain answer selection.