Xueqi Cheng

Also published as: Xue-Qi Cheng


pdf bib
Prompt Tuning with Contradictory Intentions for Sarcasm Recognition
Yiyi Liu | Ruqing Zhang | Yixing Fan | Jiafeng Guo | Xueqi Cheng
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Recently, prompt tuning has achieved promising results in a variety of natural language processing (NLP) tasks. The typical approach is to insert text pieces (i.e. templates) into the input and transform downstream tasks into the same form as pre-training. In essence, a high-quality template is the foundation of prompt tuning to support the performance of the converted cloze-style task. However, for sarcasm recognition, it is time-consuming and requires increasingly sophisticated domain knowledge to determine the appropriate templates and label words due to its highly figurative nature. In this work, we propose SarcPrompt, to incorporate the prior knowledge about contradictory intentions into prompt tuning for sarcasm recognition. SarcPrompt is inspired by that the speaker usually says the opposite of what they actually mean in the sarcastic text. Based on this idea, we explicitly mimic the actual intention by prompt construction and indicate whether the actual intention is contradictory to the literal content by verbalizer engineering. Experiments on three public datasets with standard and low-resource settings demonstrate the effectiveness of our SarcPrompt for sarcasm recognition.

pdf bib
BERM: Training the Balanced and Extractable Representation for Matching to Improve Generalization Ability of Dense Retrieval
Shicheng Xu | Liang Pang | Huawei Shen | Xueqi Cheng
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dense retrieval has shown promise in the first-stage retrieval process when trained on in-domain labeled datasets. However, previous studies have found that dense retrieval is hard to generalize to unseen domains due to its weak modeling of domain-invariant and interpretable feature (i.e., matching signal between two texts, which is the essence of information retrieval). In this paper, we propose a novel method to improve the generalization of dense retrieval via capturing matching signal called BERM. Fully fine-grained expression and query-oriented saliency are two properties of the matching signal. Thus, in BERM, a single passage is segmented into multiple units and two unit-level requirements are proposed for representation as the constraint in training to obtain the effective matching signal. One is semantic unit balance and the other is essential matching unit extractability. Unit-level view and balanced semantics make representation express the text in a fine-grained manner. Essential matching unit extractability makes passage representation sensitive to the given query to extract the pure matching information from the passage containing complex context. Experiments on BEIR show that our method can be effectively combined with different dense retrieval training methods (vanilla, hard negatives mining and knowledge distillation) to improve its generalization ability without any additional inference overhead and target domain data.

pdf bib
SimOAP: Improve Coherence and Consistency in Persona-based Dialogue Generation via Over-sampling and Post-evaluation
Junkai Zhou | Liang Pang | Huawei Shen | Xueqi Cheng
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Language models trained on large-scale corpora can generate remarkably fluent results in open-domain dialogue. However, for the persona-based dialogue generation task, consistency and coherence are also key factors, which are great challenges for language models. Existing works mainly focus on valuable data filtering, model structure modifying, or objective function designing, while their improvements are limited and hard to generalize to all types of pre-trained language models. However, we find that language models can produce consistent and coherent responses if we consider enough generations. Thus, the problems lay in large-scale response generation and target response selection. In this work, a simple but effective two-stage SimOAP strategy is proposed, i.e., over-sampling and post-evaluation. The over-sampling stage takes large-scale responses from existing trained models efficiently via off-the-shelf distilling and compressing methods, and the post-evaluation stage selects a good response based on multiple well-designed evaluation metrics from large-scale candidates. Experimental results show that the proposed plug-in SimOAP strategy improves the backbone models and outperforms the baseline strategies in both automatic and human evaluations.

pdf bib
Semantic Structure Enhanced Event Causality Identification
Zhilei Hu | Zixuan Li | Xiaolong Jin | Long Bai | Saiping Guan | Jiafeng Guo | Xueqi Cheng
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Event Causality Identification (ECI) aims to identify causal relations between events in unstructured texts. This is a very challenging task, because causal relations are usually expressed by implicit associations between events. Existing methods usually capture such associations by directly modeling the texts with pre-trained language models, which underestimate two kinds of semantic structures vital to the ECI task, namely, event-centric structure and event-associated structure. The former includes important semantic elements related to the events to describe them more precisely, while the latter contains semantic paths between two events to provide possible supports for ECI. In this paper, we study the implicit associations between events by modeling the above explicit semantic structures, and propose a Semantic Structure Integration model (SemSIn).It utilizes a GNN-based event aggregator to integrate the event-centric structure information, and employs an LSTM-based path aggregator to capture the event-associated structure information between two events. Experimental results on three widely used datasets show that SemSIn achieves significant improvements over baseline methods.


pdf bib
Complex Evolutional Pattern Learning for Temporal Knowledge Graph Reasoning
Zixuan Li | Saiping Guan | Xiaolong Jin | Weihua Peng | Yajuan Lyu | Yong Zhu | Long Bai | Wei Li | Jiafeng Guo | Xueqi Cheng
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

A Temporal Knowledge Graph (TKG) is a sequence of KGs corresponding to different timestamps. TKG reasoning aims to predict potential facts in the future given the historical KG sequences. One key of this task is to mine and understand evolutional patterns of facts from these sequences. The evolutional patterns are complex in two aspects, length-diversity and time-variability. Existing models for TKG reasoning focus on modeling fact sequences of a fixed length, which cannot discover complex evolutional patterns that vary in length. Furthermore, these models are all trained offline, which cannot well adapt to the changes of evolutional patterns from then on. Thus, we propose a new model, called Complex Evolutional Network (CEN), which uses a length-aware Convolutional Neural Network (CNN) to handle evolutional patterns of different lengths via an easy-to-difficult curriculum learning strategy. Besides, we propose to learn the model under the online setting so that it can adapt to the changes of evolutional patterns over time. Extensive experiments demonstrate that CEN obtains substantial performance improvement under both the traditional offline and the proposed online settings.

pdf bib
MetaSLRCL: A Self-Adaptive Learning Rate and Curriculum Learning Based Framework for Few-Shot Text Classification
Kailin Zhao | Xiaolong Jin | Saiping Guan | Jiafeng Guo | Xueqi Cheng
Proceedings of the 29th International Conference on Computational Linguistics

Due to the lack of labeled data in many realistic scenarios, a number of few-shot learning methods for text classification have been proposed, among which the meta learning based ones have recently attracted much attention. Such methods usually consist of a learner as the classifier and a meta learner for specializing the learner to different tasks. For the learner, learning rate is crucial to its performance. However, existing methods treat it as a hyper parameter and adjust it manually, which is time-consuming and laborious. Intuitively, for different tasks and neural network layers, the learning rates should be different and self-adaptive. For the meta learner, it requires a good generalization ability so as to quickly adapt to new tasks. Motivated by these issues, we propose a novel meta learning framework, called MetaSLRCL, for few-shot text classification. Specifically, we present a novel meta learning mechanism to obtain different learning rates for different tasks and neural network layers so as to enable the learner to quickly adapt to new training data. Moreover, we propose a task-oriented curriculum learning mechanism to help the meta learner achieve a better generalization ability by learning from different tasks with increasing difficulties. Extensive experiments on three benchmark datasets demonstrate the effectiveness of MetaSLRCL.

pdf bib
Meta-CQG: A Meta-Learning Framework for Complex Question Generation over Knowledge Bases
Kun Zhang | Yunqi Qiu | Yuanzhuo Wang | Long Bai | Wei Li | Xuhui Jiang | Huawei Shen | Xueqi Cheng
Proceedings of the 29th International Conference on Computational Linguistics

Complex question generation over knowledge bases (KB) aims to generate natural language questions involving multiple KB relations or functional constraints. Existing methods train one encoder-decoder-based model to fit all questions. However, such a one-size-fits-all strategy may not perform well since complex questions exhibit an uneven distribution in many dimensions, such as question types, involved KB relations, and query structures, resulting in insufficient learning for long-tailed samples under different dimensions. To address this problem, we propose a meta-learning framework for complex question generation. The meta-trained generator can acquire universal and transferable meta-knowledge and quickly adapt to long-tailed samples through a few most related training samples. To retrieve similar samples for each input query, we design a self-supervised graph retriever to learn distributed representations for samples, and contrastive learning is leveraged to improve the learned representations. We conduct experiments on both WebQuestionsSP and ComplexWebQuestion, and results on long-tailed samples of different dimensions have been significantly improved, which demonstrates the effectiveness of the proposed framework.

pdf bib
Visual Named Entity Linking: A New Dataset and A Baseline
Wen Sun | Yixing Fan | Jiafeng Guo | Ruqing Zhang | Xueqi Cheng
Findings of the Association for Computational Linguistics: EMNLP 2022

Visual Entity Linking (VEL) is a task to link regions of images with their corresponding entities in Knowledge Bases (KBs), which is beneficial for many computer vision tasks such as image retrieval, image caption, and visual question answering. While existing tasks in VEL either rely on textual data to complement a multi-modal linking or only link objects with general entities, which fails to perform named entity linking on large amounts of image data. In this paper, we consider a purely Visual-based Named Entity Linking (VNEL) task, where the input only consists of an image. The task is to identify objects of interest (i.e., visual entity mentions) in images and link them to corresponding named entities in KBs. Since each entity often contains rich visual and textual information in KBs, we thus propose three different sub-tasks, i.e., visual to visual entity linking (V2VEL), visual to textual entity linking (V2TEL), and visual to visual-textual entity linking (V2VTEL). In addition, we present a high-quality human-annotated visual person linking dataset, named WIKIPerson. Based on WIKIPerson, we establish a series of baseline algorithms for the solution of each sub-task, and conduct experiments to verify the quality of the proposed datasets and the effectiveness of baseline methods. We envision this work to be helpful for soliciting more works regarding VNEL in the future. The codes and datasets are publicly available at https: //github.com/ict-bigdatalab/VNEL.

pdf bib
Knowledge-Enhanced Self-Supervised Prototypical Network for Few-Shot Event Detection
Kailin Zhao | Xiaolong Jin | Long Bai | Jiafeng Guo | Xueqi Cheng
Findings of the Association for Computational Linguistics: EMNLP 2022

Prototypical network based joint methods have attracted much attention in few-shot event detection, which carry out event detection in a unified sequence tagging framework. However, these methods suffer from the inaccurate prototype representation problem, due to two main reasons: the number of instances for calculating prototypes is limited; And, they do not well capture the relationships among event prototypes. To deal with this problem, we propose a Knowledge-Enhanced self-supervised Prototypical Network, called KE-PN, for few-shot event detection. KE-PN adopts hybrid rules, which can automatically align event types to an external knowledge base, i.e., FrameNet, to obtain more instances. It proposes a self-supervised learning method to filter out noisy data from enhanced instances. KE-PN is further equipped with an auxiliary event type relationship classification module, which injects the relationship information into representations of event prototypes. Extensive experiments on three benchmark datasets, i.e., FewEvent, MAVEN, and ACE2005 demonstrate the state-of-the-art performance of KE-PN.

pdf bib
HiSMatch: Historical Structure Matching based Temporal Knowledge Graph Reasoning
Zixuan Li | Zhongni Hou | Saiping Guan | Xiaolong Jin | Weihua Peng | Long Bai | Yajuan Lyu | Wei Li | Jiafeng Guo | Xueqi Cheng
Findings of the Association for Computational Linguistics: EMNLP 2022

A Temporal Knowledge Graph (TKG) is a sequence of KGs with respective timestamps, which adopts quadruples in the form of (subject, relation, object, timestamp) to describe dynamic facts. TKG reasoning has facilitated many real-world applications via answering such queries as (query entity, query relation, ?, future timestamp) about future. This is actually a matching task between a query and candidate entities based on their historical structures, which reflect behavioral trends of the entities at different timestamps. In addition, recent KGs provide background knowledge of all the entities, which is also helpful for the matching. Thus, in this paper, we propose the Historical Structure Matching (HiSMatch) model. It applies two structure encoders to capture the semantic information contained in the historical structures of the query and candidate entities. Besides, it adopts another encoder to integrate the background knowledge into the model. TKG reasoning experiments on six benchmark datasets demonstrate the significant improvement of the proposed HiSMatch model, with up to 5.6% performance improvement in MRR, compared to the state-of-the-art baselines.


pdf bib
Transductive Learning for Unsupervised Text Style Transfer
Fei Xiao | Liang Pang | Yanyan Lan | Yan Wang | Huawei Shen | Xueqi Cheng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Unsupervised style transfer models are mainly based on an inductive learning approach, which represents the style as embeddings, decoder parameters, or discriminator parameters and directly applies these general rules to the test cases. However, the lacking of parallel corpus hinders the ability of these inductive learning methods on this task. As a result, it is likely to cause severe inconsistent style expressions, like ‘the salad is rude’. To tackle this problem, we propose a novel transductive learning approach in this paper, based on a retrieval-based context-aware style representation. Specifically, an attentional encoder-decoder with a retriever framework is utilized. It involves top-K relevant sentences in the target style in the transfer process. In this way, we can learn a context-aware style embedding to alleviate the above inconsistency problem. In this paper, both sparse (BM25) and dense retrieval functions (MIPS) are used, and two objective functions are designed to facilitate joint learning. Experimental results show that our method outperforms several strong baselines. The proposed transductive learning approach is general and effective to the task of unsupervised style transfer, and we will apply it to the other two typical methods in the future.

pdf bib
Adaptive Information Seeking for Open-Domain Question Answering
Yunchang Zhu | Liang Pang | Yanyan Lan | Huawei Shen | Xueqi Cheng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Information seeking is an essential step for open-domain question answering to efficiently gather evidence from a large corpus. Recently, iterative approaches have been proven to be effective for complex questions, by recursively retrieving new evidence at each step. However, almost all existing iterative approaches use predefined strategies, either applying the same retrieval function multiple times or fixing the order of different retrieval functions, which cannot fulfill the diverse requirements of various questions. In this paper, we propose a novel adaptive information-seeking strategy for open-domain question answering, namely AISO. Specifically, the whole retrieval and answer process is modeled as a partially observed Markov decision process, where three types of retrieval operations (e.g., BM25, DPR, and hyperlink) and one answer operation are defined as actions. According to the learned policy, AISO could adaptively select a proper retrieval action to seek the missing evidence at each step, based on the collected evidence and the reformulated query, or directly output the answer when the evidence set is sufficient for the question. Experiments on SQuAD Open and HotpotQA fullwiki, which serve as single-hop and multi-hop open-domain QA benchmarks, show that AISO outperforms all baseline methods with predefined strategies in terms of both retrieval and answer evaluations.

pdf bib
Integrating Deep Event-Level and Script-Level Information for Script Event Prediction
Long Bai | Saiping Guan | Jiafeng Guo | Zixuan Li | Xiaolong Jin | Xueqi Cheng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Scripts are structured sequences of events together with the participants, which are extracted from the texts. Script event prediction aims to predict the subsequent event given the historical events in the script. Two kinds of information facilitate this task, namely, the event-level information and the script-level information. At the event level, existing studies view an event as a verb with its participants, while neglecting other useful properties, such as the state of the participants. At the script level, most existing studies only consider a single event sequence corresponding to one common protagonist. In this paper, we propose a Transformer-based model, called MCPredictor, which integrates deep event-level and script-level information for script event prediction. At the event level, MCPredictor utilizes the rich information in the text to obtain more comprehensive event semantic representations. At the script-level, it considers multiple event sequences corresponding to different participants of the subsequent event. The experimental results on the widely-used New York Times corpus demonstrate the effectiveness and superiority of the proposed model.

pdf bib
Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs
Zixuan Li | Xiaolong Jin | Saiping Guan | Wei Li | Jiafeng Guo | Yuanzhuo Wang | Xueqi Cheng
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Temporal Knowledge Graphs (TKGs) have been developed and used in many different areas. Reasoning on TKGs that predicts potential facts (events) in the future brings great challenges to existing models. When facing a prediction task, human beings usually search useful historical information (i.e., clues) in their memories and then reason for future meticulously. Inspired by this mechanism, we propose CluSTeR to predict future facts in a two-stage manner, Clue Searching and Temporal Reasoning, accordingly. Specifically, at the clue searching stage, CluSTeR learns a beam search policy via reinforcement learning (RL) to induce multiple clues from historical facts. At the temporal reasoning stage, it adopts a graph convolution network based sequence method to deduce answers from clues. Experiments on four datasets demonstrate the substantial advantages of CluSTeR compared with the state-of-the-art methods. Moreover, the clues found by CluSTeR further provide interpretability for the results.


pdf bib
Beyond Language: Learning Commonsense from Images for Reasoning
Wanqing Cui | Yanyan Lan | Liang Pang | Jiafeng Guo | Xueqi Cheng
Findings of the Association for Computational Linguistics: EMNLP 2020

This paper proposes a novel approach to learn commonsense from images, instead of limited raw texts or costly constructed knowledge bases, for the commonsense reasoning problem in NLP. Our motivation comes from the fact that an image is worth a thousand words, where richer scene information could be leveraged to help distill the commonsense knowledge, which is often hidden in languages. Our approach, namely Loire, consists of two stages. In the first stage, a bi-modal sequence-to-sequence approach is utilized to conduct the scene layout generation task, based on a text representation model ViBERT. In this way, the required visual scene knowledge, such as spatial relations, will be encoded in ViBERT by the supervised learning process with some bi-modal data like COCO. Then ViBERT is concatenated with a pre-trained language model to perform the downstream commonsense reasoning tasks. Experimental results on two commonsense reasoning problems, i.e.commonsense question answering and pronoun resolution, demonstrate that Loire outperforms traditional language-based methods. We also give some case studies to show what knowledge is learned from images and explain how the generated scene layout helps the commonsense reasoning process.

pdf bib
NeuInfer: Knowledge Inference on N-ary Facts
Saiping Guan | Xiaolong Jin | Jiafeng Guo | Yuanzhuo Wang | Xueqi Cheng
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Knowledge inference on knowledge graph has attracted extensive attention, which aims to find out connotative valid facts in knowledge graph and is very helpful for improving the performance of many downstream applications. However, researchers have mainly poured attention to knowledge inference on binary facts. The studies on n-ary facts are relatively scarcer, although they are also ubiquitous in the real world. Therefore, this paper addresses knowledge inference on n-ary facts. We represent each n-ary fact as a primary triple coupled with a set of its auxiliary descriptive attribute-value pair(s). We further propose a neural network model, NeuInfer, for knowledge inference on n-ary facts. Besides handling the common task to infer an unknown element in a whole fact, NeuInfer can cope with a new type of task, flexible knowledge inference. It aims to infer an unknown element in a partial fact consisting of the primary triple coupled with any number of its auxiliary description(s). Experimental results demonstrate the remarkable superiority of NeuInfer.

pdf bib
Event Coreference Resolution with their Paraphrases and Argument-aware Embeddings
Yutao Zeng | Xiaolong Jin | Saiping Guan | Jiafeng Guo | Xueqi Cheng
Proceedings of the 28th International Conference on Computational Linguistics

Event coreference resolution aims to classify all event mentions that refer to the same real-world event into the same group, which is necessary to information aggregation and many downstream applications. To resolve event coreference, existing methods usually calculate the similarities between event mentions and between specific kinds of event arguments. However, they fail to accurately identify paraphrase relations between events and may suffer from error propagation while extracting event components (i.e., event mentions and their arguments). Therefore, we propose a new model based on Event-specific Paraphrases and Argument-aware Semantic Embeddings, thus called EPASE, for event coreference resolution. EPASE recognizes deep paraphrase relations in an event-specific context of sentences and can cover event paraphrases of more situations, bringing about a better generalization. Additionally, the embeddings of argument roles are encoded into event embedding without relying on a fixed number and type of arguments, which results in the better scalability of EPASE. Experiments on both within- and cross-document event coreference demonstrate its consistent and significant superiority compared to existing methods.


pdf bib
ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
Hainan Zhang | Yanyan Lan | Liang Pang | Jiafeng Guo | Xueqi Cheng
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In multi-turn dialogue generation, response is usually related with only a few contexts. Therefore, an ideal model should be able to detect these relevant contexts and produce a suitable response accordingly. However, the widely used hierarchical recurrent encoder-decoder models just treat all the contexts indiscriminately, which may hurt the following response generation process. Some researchers try to use the cosine similarity or the traditional attention mechanism to find the relevant contexts, but they suffer from either insufficient relevance assumption or position bias problem. In this paper, we propose a new model, named ReCoSa, to tackle this problem. Firstly, a word level LSTM encoder is conducted to obtain the initial representation of each context. Then, the self-attention mechanism is utilized to update both the context and masked response representation. Finally, the attention weights between each context and response representations are computed and used in the further decoding process. Experimental results on both Chinese customer services dataset and English Ubuntu dialogue dataset show that ReCoSa significantly outperforms baseline models, in terms of both metric-based and human evaluations. Further analysis on attention shows that the detected relevant contexts by ReCoSa are highly coherent with human’s understanding, validating the correctness and interpretability of ReCoSa.

pdf bib
Soft Contextual Data Augmentation for Neural Machine Translation
Fei Gao | Jinhua Zhu | Lijun Wu | Yingce Xia | Tao Qin | Xueqi Cheng | Wengang Zhou | Tie-Yan Liu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited. In this paper, we present a novel data augmentation method for neural machine translation. Different from previous augmentation methods that randomly drop, swap or replace words with other words in a sentence, we softly augment a randomly chosen word in a sentence by its contextual mixture of multiple related words. More accurately, we replace the one-hot representation of a word by a distribution (provided by a language model) over the vocabulary, i.e., replacing the embedding of this word by a weighted combination of multiple semantically similar words. Since the weights of those words depend on the contextual information of the word to be replaced,the newly generated sentences capture much richer information than previous augmentation methods. Experimental results on both small scale and large scale machine translation data sets demonstrate the superiority of our method over strong baselines.

pdf bib
Event Detection with Multi-Order Graph Convolution and Aggregated Attention
Haoran Yan | Xiaolong Jin | Xiangbin Meng | Jiafeng Guo | Xueqi Cheng
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Syntactic relations are broadly used in many NLP tasks. For event detection, syntactic relation representations based on dependency tree can better capture the interrelations between candidate trigger words and related entities than sentence representations. But, existing studies only use first-order syntactic relations (i.e., the arcs) in dependency trees to identify trigger words. For this reason, this paper proposes a new method for event detection, which uses a dependency tree based graph convolution network with aggregative attention to explicitly model and aggregate multi-order syntactic representations in sentences. Experimental comparison with state-of-the-art baselines shows the superiority of the proposed method.


pdf bib
Efficient Sequence Learning with Group Recurrent Networks
Fei Gao | Lijun Wu | Li Zhao | Tao Qin | Xueqi Cheng | Tie-Yan Liu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Recurrent neural networks have achieved state-of-the-art results in many artificial intelligence tasks, such as language modeling, neural machine translation, speech recognition and so on. One of the key factors to these successes is big models. However, training such big models usually takes days or even weeks of time even if using tens of GPU cards. In this paper, we propose an efficient architecture to improve the efficiency of such RNN model training, which adopts the group strategy for recurrent layers, while exploiting the representation rearrangement strategy between layers as well as time steps. To demonstrate the advantages of our models, we conduct experiments on several datasets and tasks. The results show that our architecture achieves comparable or better accuracy comparing with baselines, with a much smaller number of parameters and at a much lower computational cost.

pdf bib
Learning to Control the Specificity in Neural Response Generation
Ruqing Zhang | Jiafeng Guo | Yixing Fan | Yanyan Lan | Jun Xu | Xueqi Cheng
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In conversation, a general response (e.g., “I don’t know”) could correspond to a large variety of input utterances. Previous generative conversational models usually employ a single model to learn the relationship between different utterance-response pairs, thus tend to favor general and trivial responses which appear frequently. To address this problem, we propose a novel controlled response generation mechanism to handle different utterance-response relationships in terms of specificity. Specifically, we introduce an explicit specificity control variable into a sequence-to-sequence model, which interacts with the usage representation of words through a Gaussian Kernel layer, to guide the model to generate responses at different specificity levels. We describe two ways to acquire distant labels for the specificity control variable in learning. Empirical studies show that our model can significantly outperform the state-of-the-art response generation models under both automatic and human evaluations.

pdf bib
Tailored Sequence to Sequence Models to Different Conversation Scenarios
Hainan Zhang | Yanyan Lan | Jiafeng Guo | Jun Xu | Xueqi Cheng
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sequence to sequence (Seq2Seq) models have been widely used for response generation in the area of conversation. However, the requirements for different conversation scenarios are distinct. For example, customer service requires the generated responses to be specific and accurate, while chatbot prefers diverse responses so as to attract different users. The current Seq2Seq model fails to meet these diverse requirements, by using a general average likelihood as the optimization criteria. As a result, it usually generates safe and commonplace responses, such as ‘I don’t know’. In this paper, we propose two tailored optimization criteria for Seq2Seq to different conversation scenarios, i.e., the maximum generated likelihood for specific-requirement scenario, and the conditional value-at-risk for diverse-requirement scenario. Experimental results on the Ubuntu dialogue corpus (Ubuntu service scenario) and Chinese Weibo dataset (social chatbot scenario) show that our proposed models not only satisfies diverse requirements for different scenarios, but also yields better performances against traditional Seq2Seq models in terms of both metric-based and human evaluations.

pdf bib
Document Embedding Enhanced Event Detection with Hierarchical and Supervised Attention
Yue Zhao | Xiaolong Jin | Yuanzhuo Wang | Xueqi Cheng
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Document-level information is very important for event detection even at sentence level. In this paper, we propose a novel Document Embedding Enhanced Bi-RNN model, called DEEB-RNN, to detect events in sentences. This model first learns event detection oriented embeddings of documents through a hierarchical and supervised attention based RNN, which pays word-level attention to event triggers and sentence-level attention to those sentences containing events. It then uses the learned document embedding to enhance another bidirectional RNN model to identify event triggers and their types in sentences. Through experiments on the ACE-2005 dataset, we demonstrate the effectiveness and merits of the proposed DEEB-RNN model via comparison with state-of-the-art methods.

pdf bib
Exploiting Contextual Information via Dynamic Memory Network for Event Detection
Shaobo Liu | Rui Cheng | Xiaoming Yu | Xueqi Cheng
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The task of event detection involves identifying and categorizing event triggers. Contextual information has been shown effective on the task. However, existing methods which utilize contextual information only process the context once. We argue that the context can be better exploited by processing the context multiple times, allowing the model to perform complex reasoning and to generate better context representation, thus improving the overall performance. Meanwhile, dynamic memory network (DMN) has demonstrated promising capability in capturing contextual information and has been applied successfully to various tasks. In light of the multi-hop mechanism of the DMN to model the context, we propose the trigger detection dynamic memory network (TD-DMN) to tackle the event detection problem. We performed a five-fold cross-validation on the ACE-2005 dataset and experimental results show that the multi-hop mechanism does improve the performance and the proposed model achieves best F1 score compared to the state-of-the-art methods.


pdf bib
HANSpeller++: A Unified Framework for Chinese Spelling Correction
Shuiyuan Zhang | Jinhua Xiong | Jianpeng Hou | Qiao Zhang | Xueqi Cheng
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

pdf bib
Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations
Fei Sun | Jiafeng Guo | Yanyan Lan | Jun Xu | Xueqi Cheng
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
HANSpeller: A Unified Framework for Chinese Spelling Correction
Jinhua Xiong | Qiao Zhang | Shuiyuan Zhang | Jianpeng Hou | Xueqi Cheng
International Journal of Computational Linguistics & Chinese Language Processing, Volume 20, Number 1, June 2015-Special Issue on Chinese as a Foreign Language


pdf bib
Extended HMM and Ranking Models for Chinese Spelling Correction
Jinhua Xiong | Qiao Zhang | Jianpeng Hou | Qianbo Wang | Yuanzhuo Wang | Xueqi Cheng
Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing


pdf bib
A Self-learning Template Approach for Recognizing Named Entities from Web Text
Qian Liu | Bingyang Liu | Dayong Wu | Yue Liu | Xueqi Cheng
Proceedings of the Sixth International Joint Conference on Natural Language Processing


pdf bib
MIEA: a Mutual Iterative Enhancement Approach for Cross-Domain Sentiment Classification
Qiong Wu | Songbo Tan | Xueqi Cheng | Miyi Duan
Coling 2010: Posters


pdf bib
Graph Ranking for Sentiment Transfer
Qiong Wu | Songbo Tan | Xueqi Cheng
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Improving SCL Model for Sentiment-Transfer Learning
Songbo Tan | Xueqi Cheng
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers


pdf bib
Chinese Named Entity Recognition Using Role Model
Hua-Ping Zhang | Qun Liu | Hong-Kui Yu | Xue-Qi Cheng | Shuo Bai
International Journal of Computational Linguistics & Chinese Language Processing, Volume 8, Number 2, August 2003

pdf bib
Chinese Lexical Analysis Using Hierarchical Hidden Markov Model
Hua-Ping Zhang | Qun Liu | Xue-Qi Cheng | Hao Zhang | Hong-Kui Yu
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing


pdf bib
Automatic Recognition of Chinese Unknown Words Based on Roles Tagging
Kevin Zhang | Qun Liu | Hao Zhang | Xue-Qi Cheng
COLING-02: The First SIGHAN Workshop on Chinese Language Processing