Ruifeng Xu


2021

pdf bib
Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge
Bin Liang | Hang Su | Rongdi Yin | Lin Gui | Min Yang | Qin Zhao | Xiaoqi Yu | Ruifeng Xu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper, we investigate the Aspect Category Sentiment Analysis (ACSA) task from a novel perspective by exploring a Beta Distribution guided aspect-aware graph construction based on external knowledge. That is, we are no longer entangled about how to laboriously search the sentiment clues for coarse-grained aspects from the context, but how to preferably find the words highly related to the aspects in the context and determine their importance based on the public knowledge base. In this way, the contextual sentiment clues can be explicitly tracked in ACSA for the aspects in the light of these aspect-related words. To be specific, we first regard each aspect as a pivot to derive aspect-aware words that are highly related to the aspect from external affective commonsense knowledge. Then, we employ Beta Distribution to educe the aspect-aware weight, which reflects the importance to the aspect, for each aspect-aware word. Afterward, the aspect-aware words are served as the substitutes of the coarse-grained aspect to construct graphs for leveraging the aspect-related contextual sentiment dependencies in ACSA. Experiments on 6 benchmark datasets show that our approach significantly outperforms the state-of-the-art baseline methods.

pdf bib
Progressive Self-Training with Discriminator for Aspect Term Extraction
Qianlong Wang | Zhiyuan Wen | Qin Zhao | Min Yang | Ruifeng Xu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Aspect term extraction aims to extract aspect terms from a review sentence that users have expressed opinions on. One of the remaining challenges for aspect term extraction resides in the lack of sufficient annotated data. While self-training is potentially an effective method to address this issue, the pseudo-labels it yields on unlabeled data could induce noise. In this paper, we use two means to alleviate the noise in the pseudo-labels. One is that inspired by the curriculum learning, we refine the conventional self-training to progressive self-training. Specifically, the base model infers pseudo-labels on a progressive subset at each iteration, where samples in the subset become harder and more numerous as the iteration proceeds. The other is that we use a discriminator to filter the noisy pseudo-labels. Experimental results on four SemEval datasets show that our model significantly outperforms the previous baselines and achieves state-of-the-art performance.

pdf bib
An Empirical Study on Multiple Information Sources for Zero-Shot Fine-Grained Entity Typing
Yi Chen | Haiyun Jiang | Lemao Liu | Shuming Shi | Chuang Fan | Min Yang | Ruifeng Xu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Auxiliary information from multiple sources has been demonstrated to be effective in zero-shot fine-grained entity typing (ZFET). However, there lacks a comprehensive understanding about how to make better use of the existing information sources and how they affect the performance of ZFET. In this paper, we empirically study three kinds of auxiliary information: context consistency, type hierarchy and background knowledge (e.g., prototypes and descriptions) of types, and propose a multi-source fusion model (MSF) targeting these sources. The performance obtains up to 11.42% and 22.84% absolute gains over state-of-the-art baselines on BBN and Wiki respectively with regard to macro F1 scores. More importantly, we further discuss the characteristics, merits and demerits of each information source and provide an intuitive understanding of the complementarity among them.

pdf bib
Argument Pair Extraction with Mutual Guidance and Inter-sentence Relation Graph
Jianzhu Bao | Bin Liang | Jingyi Sun | Yice Zhang | Min Yang | Ruifeng Xu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Argument pair extraction (APE) aims to extract interactive argument pairs from two passages of a discussion. Previous work studied this task in the context of peer review and rebuttal, and decomposed it into a sequence labeling task and a sentence relation classification task. However, despite the promising performance, such an approach obtains the argument pairs implicitly by the two decomposed tasks, lacking explicitly modeling of the argument-level interactions between argument pairs. In this paper, we tackle the APE task by a mutual guidance framework, which could utilize the information of an argument in one passage to guide the identification of arguments that can form pairs with it in another passage. In this manner, two passages can mutually guide each other in the process of APE. Furthermore, we propose an inter-sentence relation graph to effectively model the inter-relations between two sentences and thus facilitates the extraction of argument pairs. Our proposed method can better represent the holistic argument-level semantics and thus explicitly capture the complex correlations between argument pairs. Experimental results show that our approach significantly outperforms the current state-of-the-art model.

pdf bib
REAM: An Enhancement Approach to Reference-based Evaluation Metrics for Open-domain Dialog Generation
Jun Gao | Wei Bi | Ruifeng Xu | Shuming Shi
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations
Jun Gao | Yuhan Liu | Haolin Deng | Wei Wang | Yu Cao | Jiachen Du | Ruifeng Xu
Findings of the Association for Computational Linguistics: EMNLP 2021

Current approaches to empathetic response generation focus on learning a model to predict an emotion label and generate a response based on this label and have achieved promising results. However, the emotion cause, an essential factor for empathetic responding, is ignored. The emotion cause is a stimulus for human emotions. Recognizing the emotion cause is helpful to better understand human emotions so as to generate more empathetic responses. To this end, we propose a novel framework that improves empathetic response generation by recognizing emotion cause in conversations. Specifically, an emotion reasoner is designed to predict a context emotion label and a sequence of emotion cause-oriented labels, which indicate whether the word is related to the emotion cause. Then we devise both hard and soft gated attention mechanisms to incorporate the emotion cause into response generation. Experiments show that incorporating emotion cause information improves the performance of the model on both emotion recognition and response generation.

pdf bib
HITSZ-HLT at SemEval-2021 Task 5: Ensemble Sequence Labeling and Span Boundary Detection for Toxic Span Detection
Qinglin Zhu | Zijie Lin | Yice Zhang | Jingyi Sun | Xiang Li | Qihui Lin | Yixue Dang | Ruifeng Xu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents the winning system that participated in SemEval-2021 Task 5: Toxic Spans Detection. This task aims to locate those spans that attribute to the text’s toxicity within a text, which is crucial for semi-automated moderation in online discussions. We formalize this task as the Sequence Labeling (SL) problem and the Span Boundary Detection (SBD) problem separately and employ three state-of-the-art models. Next, we integrate predictions of these models to produce a more credible and complement result. Our system achieves a char-level score of 70.83%, ranking 1/91. In addition, we also explore the lexicon-based method, which is strongly interpretable and flexible in practice.

pdf bib
A Neural Transition-based Model for Argumentation Mining
Jianzhu Bao | Chuang Fan | Jipeng Wu | Yixue Dang | Jiachen Du | Ruifeng Xu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The goal of argumentation mining is to automatically extract argumentation structures from argumentative texts. Most existing methods determine argumentative relations by exhaustively enumerating all possible pairs of argument components, which suffer from low efficiency and class imbalance. Moreover, due to the complex nature of argumentation, there is, so far, no universal method that can address both tree and non-tree structured argumentation. Towards these issues, we propose a neural transition-based model for argumentation mining, which incrementally builds an argumentation graph by generating a sequence of actions, avoiding inefficient enumeration operations. Furthermore, our model can handle both tree and non-tree structured argumentation without introducing any structural constraints. Experimental results show that our model achieves the best performance on two public datasets of different structures.

pdf bib
Continual Learning for Task-oriented Dialogue System with Iterative Network Pruning, Expanding and Masking
Binzong Geng | Fajie Yuan | Qiancheng Xu | Ying Shen | Ruifeng Xu | Min Yang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This ability to learn consecutive tasks without forgetting how to perform previously trained problems is essential for developing an online dialogue system. This paper proposes an effective continual learning method for the task-oriented dialogue system with iterative network pruning, expanding, and masking (TPEM), which preserves performance on previously encountered tasks while accelerating learning progress on subsequent tasks. Specifically, TPEM (i) leverages network pruning to keep the knowledge for old tasks, (ii) adopts network expanding to create free weights for new tasks, and (iii) introduces task-specific network masking to alleviate the negative impact of fixed weights of old tasks on new tasks. We conduct extensive experiments on seven different tasks from three benchmark datasets and show empirically that TPEM leads to significantly improved results over the strong competitors.

2020

pdf bib
Transition-based Directed Graph Construction for Emotion-Cause Pair Extraction
Chuang Fan | Chaofa Yuan | Jiachen Du | Lin Gui | Min Yang | Ruifeng Xu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Emotion-cause pair extraction aims to extract all potential pairs of emotions and corresponding causes from unannotated emotion text. Most existing methods are pipelined framework, which identifies emotions and extracts causes separately, leading to a drawback of error propagation. Towards this issue, we propose a transition-based model to transform the task into a procedure of parsing-like directed graph construction. The proposed model incrementally generates the directed graph with labeled edges based on a sequence of actions, from which we can recognize emotions with the corresponding causes simultaneously, thereby optimizing separate subtasks jointly and maximizing mutual benefits of tasks interdependently. Experimental results show that our approach achieves the best performance, outperforming the state-of-the-art methods by 6.71% (p<0.01) in F1 measure.

pdf bib
结合金融领域情感词典和注意力机制的细粒度情感分析(Attention-based Recurrent Network Combined with Financial Lexicon for Aspect-level Sentiment Classification)
Qinglin Zhu (祝清麟) | Bin Liang (梁斌) | Liuyu Han (刘宇瀚) | Yi Chen (陈奕) | Ruifeng Xu (徐睿峰) | Ruibin Mao (毛瑞彬)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

针对在金融领域实体级情感分析任务中,往往缺乏足够的标注语料,以及通用的情感分析模型难以有效处理金融文本等问题。本文构建一个百万级别的金融领域实体情感分析语料库,并标注五千余个金融领域情感词作为金融领域情感词典。同时,基于该金融领域数据集,提出一种结合金融领域情感词典和注意力机制的金融文本细粒度情感分析模型。该模型使用两个LSTM网络分别提取词级别的语义信息和基于情感词典分类后的词类级别信息,能有效获取金融领域词语的特征信息。此外,为了让文本中金融领域情感词获得更多关注,提出一种基于金融领域情感词典的注意力机制来为不同实体获取重要的情感信息。最终在构建的金融领域实体级语料库上进行实验,取得了比对比模型更好的效果。

pdf bib
基于循环交互注意力网络的问答立场分析(A Recurrent Interactive Attention Network for Answer Stance Analysis)
Wangda Luo (骆旺达) | Yuhan Liu (刘宇瀚) | Bin Liang (梁斌) | Ruifeng Xu (徐睿峰)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

针对问答立场任务中,现有方法难以提取问答文本间的依赖关系问题,本文提出一种基于循环交互注意力(Recurrent Interactive Attention, RIA)网络的问答立场分析方法。该方法通过模仿人类阅读理解时的思维方式,基于交互注意力机制和循环迭代方法,有效地从问题和答案的相互联系中挖掘问答文本的立场信息。此外,该方法将问题进行陈述化表示,有效地解决疑问句表述下问题文本无法明确表达自身立场的问题。实验结果表明,本文方法取得了比现有模型方法更好的效果,同时证明该方法能有效拟合问答立场分析任务中的问答对依赖关系。

pdf bib
Jointly Learning Aspect-Focused and Inter-Aspect Relations with Graph Convolutional Networks for Aspect Sentiment Analysis
Bin Liang | Rongdi Yin | Lin Gui | Jiachen Du | Ruifeng Xu
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we explore a novel solution of constructing a heterogeneous graph for each instance by leveraging aspect-focused and inter-aspect contextual dependencies for the specific aspect and propose an Interactive Graph Convolutional Networks (InterGCN) model for aspect sentiment analysis. Specifically, an ordinary dependency graph is first constructed for each sentence over the dependency tree. Then we refine the graph by considering the syntactical dependencies between contextual words and aspect-specific words to derive the aspect-focused graph. Subsequently, the aspect-focused graph and the corresponding embedding matrix are fed into the aspect-focused GCN to capture the key aspect and contextual words. Besides, to interactively extract the inter-aspect relations for the specific aspect, an inter-aspect GCN is adopted to model the representations learned by aspect-focused GCN based on the inter-aspect graph which is constructed by the relative dependencies between the aspect words and other aspects. Hence, the model can be aware of the significant contextual and aspect words when interactively learning the sentiment features for a specific aspect. Experimental results on four benchmark datasets illustrate that our proposed model outperforms state-of-the-art methods and substantially boosts the performance in comparison with BERT.

pdf bib
Dual Dynamic Memory Network for End-to-End Multi-turn Task-oriented Dialog Systems
Jian Wang | Junhao Liu | Wei Bi | Xiaojiang Liu | Kejing He | Ruifeng Xu | Min Yang
Proceedings of the 28th International Conference on Computational Linguistics

Existing end-to-end task-oriented dialog systems struggle to dynamically model long dialog context for interactions and effectively incorporate knowledge base (KB) information into dialog generation. To conquer these limitations, we propose a Dual Dynamic Memory Network (DDMN) for multi-turn dialog generation, which maintains two core components: dialog memory manager and KB memory manager. The dialog memory manager dynamically expands the dialog memory turn by turn and keeps track of dialog history with an updating mechanism, which encourages the model to filter irrelevant dialog history and memorize important newly coming information. The KB memory manager shares the structural KB triples throughout the whole conversation, and dynamically extracts KB information with a memory pointer at each turn. Experimental results on three benchmark datasets demonstrate that DDMN significantly outperforms the strong baselines in terms of both automatic evaluation and human evaluation. Our code is available at https://github.com/siat-nlp/DDMN.

pdf bib
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover’s Distance
Jianquan Li | Xiaokang Liu | Honghong Zhao | Ruifeng Xu | Min Yang | Yaohong Jin
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Pre-trained language models (e.g., BERT) have achieved significant success in various natural language processing (NLP) tasks. However, high storage and computational costs obstruct pre-trained language models to be effectively deployed on resource-constrained devices. In this paper, we propose a novel BERT distillation method based on many-to-many layer mapping, which allows each intermediate student layer to learn from any intermediate teacher layers. In this way, our model can learn from different teacher layers adaptively for different NLP tasks. In addition, we leverage Earth Mover’s Distance (EMD) to compute the minimum cumulative cost that must be paid to transform knowledge from teacher network to student network. EMD enables effective matching for the many-to-many layer mapping. Furthermore, we propose a cost attention mechanism to learn the layer weights used in EMD automatically, which is supposed to further improve the model’s performance and accelerate convergence time. Extensive experiments on GLUE benchmark demonstrate that our model achieves competitive performance compared to strong competitors in terms of both accuracy and model compression

pdf bib
Amalgamating Knowledge from Two Teachers for Task-oriented Dialogue System with Adversarial Training
Wanwei He | Min Yang | Rui Yan | Chengming Li | Ying Shen | Ruifeng Xu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The challenge of both achieving task completion by querying the knowledge base and generating human-like responses for task-oriented dialogue systems is attracting increasing research attention. In this paper, we propose a “Two-Teacher One-Student” learning framework (TTOS) for task-oriented dialogue, with the goal of retrieving accurate KB entities and generating human-like responses simultaneously. TTOS amalgamates knowledge from two teacher networks that together provide comprehensive guidance to build a high-quality task-oriented dialogue system (student network). Each teacher network is trained via reinforcement learning with a goal-specific reward, which can be viewed as an expert towards the goal and transfers the professional characteristic to the student network. Instead of adopting the classic student-teacher learning of forcing the output of a student network to exactly mimic the soft targets produced by the teacher networks, we introduce two discriminators as in generative adversarial network (GAN) to transfer knowledge from two teachers to the student. The usage of discriminators relaxes the rigid coupling between the student and teachers. Extensive experiments on two benchmark datasets (i.e., CamRest and In-Car Assistant) demonstrate that TTOS significantly outperforms baseline methods.

pdf bib
Emotion-Cause Pair Extraction as Sequence Labeling Based on A Novel Tagging Scheme
Chaofa Yuan | Chuang Fan | Jianzhu Bao | Ruifeng Xu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The task of emotion-cause pair extraction deals with finding all emotions and the corresponding causes in unannotated emotion texts. Most recent studies are based on the likelihood of Cartesian product among all clause candidates, resulting in a high computational cost. Targeting this issue, we regard the task as a sequence labeling problem and propose a novel tagging scheme with coding the distance between linked components into the tags, so that emotions and the corresponding causes can be extracted simultaneously. Accordingly, an end-to-end model is presented to process the input texts from left to right, always with linear time complexity, leading to a speed up. Experimental results show that our proposed model achieves the best performance, outperforming the state-of-the-art method by 2.26% (p<0.001) in F1 measure.

pdf bib
The Design and Construction of a Chinese Sarcasm Dataset
Xiaochang Gong | Qin Zhao | Jun Zhang | Ruibin Mao | Ruifeng Xu
Proceedings of the 12th Language Resources and Evaluation Conference

As a typical multi-layered semi-conscious language phenomenon, sarcasm is widely existed in social media text for enhancing the emotion expression. Thus, the detection and processing of sarcasm is important to social media analysis.However, most existing sarcasm dataset are in English and there is still a lack of authoritative Chinese sarcasm dataset. In this paper, we presents the design and construction of a largest high-quality Chinese sarcasm dataset, which contains 2,486 manual annotated sarcastic texts and 89,296 non-sarcastic texts. Furthermore, a balanced dataset through elaborately sampling the same amount non-sarcastic texts for training sarcasm classifier. Using the dataset as the benchmark, some sarcasm classification methods are evaluated.

pdf bib
Target-based Sentiment Annotation in Chinese Financial News
Chaofa Yuan | Yuhan Liu | Rongdi Yin | Jun Zhang | Qinling Zhu | Ruibin Mao | Ruifeng Xu
Proceedings of the 12th Language Resources and Evaluation Conference

This paper presents the design and construction of a large-scale target-based sentiment annotation corpus on Chinese financial news text. Different from the most existing paragraph/document-based annotation corpus, in this study, target-based fine-grained sentiment annotation is performed. The companies, brands and other financial entities are regarded as the targets. The clause reflecting the profitability, loss or other business status of financial entities is regarded as the sentiment expression for determining the polarity. Based on high quality annotation guideline and effective quality control strategy, a corpus with 8,314 target-level sentiment annotation is constructed on 6,336 paragraphs from Chinese financial news text. Based on this corpus, several state-of-the-art sentiment analysis models are evaluated.

2019

pdf bib
Neural Topic Model with Reinforcement Learning
Lin Gui | Jia Leng | Gabriele Pergola | Yu Zhou | Ruifeng Xu | Yulan He
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In recent years, advances in neural variational inference have achieved many successes in text processing. Examples include neural topic models which are typically built upon variational autoencoder (VAE) with an objective of minimising the error of reconstructing original documents based on the learned latent topic vectors. However, minimising reconstruction errors does not necessarily lead to high quality topics. In this paper, we borrow the idea of reinforcement learning and incorporate topic coherence measures as reward signals to guide the learning of a VAE-based topic model. Furthermore, our proposed model is able to automatically separating background words dynamically from topic words, thus eliminating the pre-processing step of filtering infrequent and/or top frequent words, typically required for learning traditional topic models. Experimental results on the 20 Newsgroups and the NIPS datasets show superior performance both on perplexity and topic coherence measure compared to state-of-the-art neural topic models.

pdf bib
A Knowledge Regularized Hierarchical Approach for Emotion Cause Analysis
Chuang Fan | Hongyu Yan | Jiachen Du | Lin Gui | Lidong Bing | Min Yang | Ruifeng Xu | Ruibin Mao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Emotion cause analysis, which aims to identify the reasons behind emotions, is a key topic in sentiment analysis. A variety of neural network models have been proposed recently, however, these previous models mostly focus on the learning architecture with local textual information, ignoring the discourse and prior knowledge, which play crucial roles in human text comprehension. In this paper, we propose a new method to extract emotion cause with a hierarchical neural model and knowledge-based regularizations, which aims to incorporate discourse context information and restrain the parameters by sentiment lexicon and common knowledge. The experimental results demonstrate that our proposed method achieves the state-of-the-art performance on two public datasets in different languages (Chinese and English), outperforming a number of competitive baselines by at least 2.08% in F-measure.

pdf bib
A Challenge Dataset and Effective Models for Aspect-Based Sentiment Analysis
Qingnan Jiang | Lei Chen | Ruifeng Xu | Xiang Ao | Min Yang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Aspect-based sentiment analysis (ABSA) has attracted increasing attention recently due to its broad applications. In existing ABSA datasets, most sentences contain only one aspect or multiple aspects with the same sentiment polarity, which makes ABSA task degenerate to sentence-level sentiment analysis. In this paper, we present a new large-scale Multi-Aspect Multi-Sentiment (MAMS) dataset, in which each sentence contains at least two different aspects with different sentiment polarities. The release of this dataset would push forward the research in this field. In addition, we propose simple yet effective CapsNet and CapsNet-BERT models which combine the strengths of recent NLP advances. Experiments on our new dataset show that the proposed model significantly outperforms the state-of-the-art baseline methods

pdf bib
Context-aware Embedding for Targeted Aspect-based Sentiment Analysis
Bin Liang | Jiachen Du | Ruifeng Xu | Binyang Li | Hejiao Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Attention-based neural models were employed to detect the different aspects and sentiment polarities of the same target in targeted aspect-based sentiment analysis (TABSA). However, existing methods do not specifically pre-train reasonable embeddings for targets and aspects in TABSA. This may result in targets or aspects having the same vector representations in different contexts and losing the context-dependent information. To address this problem, we propose a novel method to refine the embeddings of targets and aspects. Such pivotal embedding refinement utilizes a sparse coefficient vector to adjust the embeddings of target and aspect from the context. Hence the embeddings of targets and aspects can be refined from the highly correlative words instead of using context-independent or randomly initialized vectors. Experiment results on two benchmark datasets show that our approach yields the state-of-the-art performance in TABSA task.

2018

pdf bib
Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates
Di Chen | Jiachen Du | Lidong Bing | Ruifeng Xu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Inferring the agreement/disagreement relation in debates, especially in online debates, is one of the fundamental tasks in argumentation mining. The expressions of agreement/disagreement usually rely on argumentative expressions in text as well as interactions between participants in debates. Previous works usually lack the capability of jointly modeling these two factors. To alleviate this problem, this paper proposes a hybrid neural attention model which combines self and cross attention mechanism to locate salient part from textual context and interaction between users. Experimental results on three (dis)agreement inference datasets show that our model outperforms the state-of-the-art models.

pdf bib
Variational Autoregressive Decoder for Neural Response Generation
Jiachen Du | Wenjie Li | Yulan He | Ruifeng Xu | Lidong Bing | Xuan Wang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Combining the virtues of probability graphic models and neural networks, Conditional Variational Auto-encoder (CVAE) has shown promising performance in applications such as response generation. However, existing CVAE-based models often generate responses from a single latent variable which may not be sufficient to model high variability in responses. To solve this problem, we propose a novel model that sequentially introduces a series of latent variables to condition the generation of each word in the response sequence. In addition, the approximate posteriors of these latent variables are augmented with a backward Recurrent Neural Network (RNN), which allows the latent variables to capture long-term dependencies of future tokens in generation. To facilitate training, we supplement our model with an auxiliary objective that predicts the subsequent bag of words. Empirical experiments conducted on Opensubtitle and Reddit datasets show that the proposed model leads to significant improvement on both relevance and diversity over state-of-the-art baselines.

pdf bib
The UIR Uncertainty Corpus for Chinese: Annotating Chinese Microblog Corpus for Uncertainty Identification from Social Media
Binyang Li | Jun Xiang | Le Chen | Xu Han | Xiaoyan Yu | Ruifeng Xu | Tengjiao Wang | Kam-fai Wong
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
A Question Answering Approach for Emotion Cause Extraction
Lin Gui | Jiannan Hu | Yulan He | Ruifeng Xu | Qin Lu | Jiachen Du
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Emotion cause extraction aims to identify the reasons behind a certain emotion expressed in text. It is a much more difficult task compared to emotion classification. Inspired by recent advances in using deep memory networks for question answering (QA), we propose a new approach which considers emotion cause identification as a reading comprehension task in QA. Inspired by convolutional neural networks, we propose a new mechanism to store relevant context in different memory slots to model context information. Our proposed approach can extract both word level sequence features and lexical features. Performance evaluation shows that our method achieves the state-of-the-art performance on a recently released emotion cause dataset, outperforming a number of competitive baselines by at least 3.01% in F-measure.

2016

pdf bib
Event-Driven Emotion Cause Extraction with Corpus Construction
Lin Gui | Dongyin Wu | Ruifeng Xu | Qin Lu | Yu Zhou
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering
Tao Chen | Ruifeng Xu | Yulan He | Xuan Wang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
A Joint Model for Chinese Microblog Sentiment Analysis
Yuhui Cao | Zhao Chen | Ruifeng Xu | Tao Chen | Lin Gui
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

2014

pdf bib
Personal Attributes Extraction in Chinese Text Bakeoff in CLP 2014: Overview
Ruifeng Xu | Shuai Wang | Feng Shi | Jian Xu
Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Automatic Labelling of Topic Models Learned from Twitter by Summarisation
Amparo Elizabeth Cano Basave | Yulan He | Ruifeng Xu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Cross-lingual Opinion Analysis via Negative Transfer Detection
Lin Gui | Ruifeng Xu | Qin Lu | Jun Xu | Jian Xu | Bin Liu | Xiaolong Wang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Web Information Mining and Decision Support Platform for the Modern Service Industry
Binyang Li | Lanjun Zhou | Zhongyu Wei | Kam-fai Wong | Ruifeng Xu | Yunqing Xia
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf bib
Incorporating Rule-based and Statistic-based Techniques for Coreference Resolution
Ruifeng Xu | Jun Xu | Jie Liu | Chengxiang Liu | Chengtian Zou | Lin Gui | Yanzhen Zheng | Peng Qu
Joint Conference on EMNLP and CoNLL - Shared Task

pdf bib
Explore Chinese Encyclopedic Knowledge to Disambiguate Person Names
Jie Liu | Ruifeng Xu | Qin Lu | Jian Xu
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

2011

pdf bib
Diversifying Information Needs in Results of Question Retrieval
Yaoyun Zhang | Xiaolong Wang | Xuan Wang | Ruifeng Xu | Jun Xu | ShiXi Fan
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Instance Level Transfer Learning for Cross Lingual Opinion Analysis
Ruifeng Xu | Jun Xu | Xiaolong Wang
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

2010

pdf bib
HITSZ_CITYU: Combine Collocation, Context Words and Neighboring Sentence Sentiment in Sentiment Adjectives Disambiguation
Ruifeng Xu | Jun Xu | Chunyu Kit
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Combine Person Name and Person Identity Recognition and Document Clustering for Chinese Person Name Disambiguation
Ruifeng Xu | Jun Xu | Xiangying Dai | Chunyu Kit
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2008

pdf bib
Opinion Annotation in On-line Chinese Product Reviews
Ruifeng Xu | Yunqing Xia | Kam-Fai Wong | Wenjie Li
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents the design and construction of a Chinese opinion corpus based on the online product reviews. Based on the observation on the characteristics of opinion expression in Chinese online product reviews, which is quite different from in the formal texts such as news, an annotation framework is proposed to guide the construction of the first Chinese opinion corpus based on online product reviews. The opinionated sentences are manually identified from the review text. Furthermore, for each comment in the opinionated sentence, its 13 describing elements are annotated including the expressions related to the interested product attributes and user opinions as well as the polarity and degree of the opinions. Currently, 12,724 comments are annotated in 10,935 sentences from review text. Through statistical analysis on the opinion corpus, some interesting characteristics of Chinese opinion expression are presented. This corpus is shown helpful to support systematic research on Chinese opinion analysis.

2007

pdf bib
Annotating Chinese Collocations with Multi Information
Ruifeng Xu | Qin Lu | Kam-Fai Wong | Wenjie Li
Proceedings of the Linguistic Annotation Workshop

2006

pdf bib
Interaction between Lexical Base and Ontology with Formal Concept Analysis
Sujian Li | Qin Lu | Wenjie Li | Ruifeng Xu
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

An ontology describes conceptual knowledge in a specific domain. A lexical base collects a repository of words and gives independent definition of concepts. In this paper, we propose to use FCA as a tool to help constructing an ontology through an existing lexical base. We mainly address two issues. The first issue is how to select attributes to visualize the relations between lexical terms. The second issue is how to revise lexical definitions through analysing the relations in the ontology. Thus the focus is on the effect of interaction between a lexical base and an ontology for the purpose of good ontology construction. Finally, experiments have been conducted to verify our ideas.

pdf bib
The Design and Construction of A Chinese Collocation Bank
Ruifeng Xu | Qin Lu | Sujian Li
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents an annotated Chinese collocation bank developed at the Hong Kong Polytechnic University. The definition of collocation with good linguistic consistency and good computational operability is first discussed and the properties of collocations are then presented. Secondly, based on the combination of different properties, collocations are classified into four types. Thirdly, the annotation guideline is presented. Fourthly, the implementation issues for collocation bank construction are addressed including the annotation with categorization, dependency and contextual information. Currently, the collocation bank is completed for 3,643 headwords in a 5-million-word corpus.

2005

pdf bib
Similarity Based Chinese Synonym Collocation Extraction
Wanyin Li | Qin Lu | Ruifeng Xu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 1, March 2005

pdf bib
The Design and Construction of the PolyU Shallow Treebank
Ruifeng Xu | Qin Lu | Yin Li | Wanyin Li
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 3, September 2005: Special Issue on Selected Papers from ROCLING XVI

2004

pdf bib
Using Synonym Relations in Chinese Collocation Extraction
Wanyin Li | Qin Lu | Ruifeng Xu
Proceedings of the Third SIGHAN Workshop on Chinese Language Processing

pdf bib
The Construction of A Chinese Shallow Treebank
Ruifeng Xu | Qin Lu | Yin Li | Wanyin Li
Proceedings of the Third SIGHAN Workshop on Chinese Language Processing