Philip S. Yu

Also published as: Philip Yu


2022

pdf bib
CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking
Xuming Hu | Zhijiang Guo | GuanYu Wu | Aiwei Liu | Lijie Wen | Philip Yu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The explosion of misinformation spreading in the media ecosystem urges for automated fact-checking. While misinformation spans both geographic and linguistic boundaries, most work in the field has focused on English. Datasets and tools available in other languages, such as Chinese, are limited. In order to bridge this gap, we construct CHEF, the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims. The dataset covers multiple domains, ranging from politics to public health, and provides annotated evidence retrieved from the Internet. Further, we develop established baselines and a novel approach that is able to model the evidence retrieval as a latent variable, allowing jointly training with the veracity prediction model in an end-to-end fashion. Extensive experiments show that CHEF will provide a challenging testbed for the development of fact-checking systems designed to retrieve and reason over non-English claims.

pdf bib
HiURE: Hierarchical Exemplar Contrastive Learning for Unsupervised Relation Extraction
Shuliang Liu | Xuming Hu | Chenwei Zhang | Shu’ang Li | Lijie Wen | Philip Yu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Unsupervised relation extraction aims to extract the relationship between entities from natural language sentences without prior information on relational scope or distribution. Existing works either utilize self-supervised schemes to refine relational feature signals by iteratively leveraging adaptive clustering and classification that provoke gradual drift problems, or adopt instance-wise contrastive learning which unreasonably pushes apart those sentence pairs that are semantically similar. To overcome these defects, we propose a novel contrastive learning framework named HiURE, which has the capability to derive hierarchical signals from relational feature space using cross hierarchy attention and effectively optimize relation representation of sentences under exemplar-wise contrastive learning. Experimental results on two public datasets demonstrate the advanced effectiveness and robustness of HiURE on unsupervised relation extraction when compared with state-of-the-art models.

pdf bib
XLTime: A Cross-Lingual Knowledge Transfer Framework for Temporal Expression Extraction
Yuwei Cao | William Groves | Tanay Kumar Saha | Joel Tetreault | Alejandro Jaimes | Hao Peng | Philip Yu
Findings of the Association for Computational Linguistics: NAACL 2022

Temporal Expression Extraction (TEE) is essential for understanding time in natural language. It has applications in Natural Language Processing (NLP) tasks such as question answering, information retrieval, and causal inference. To date, work in this area has mostly focused on English as there is a scarcity of labeled data for other languages. We propose XLTime, a novel framework for multilingual TEE. XLTime works on top of pre-trained language models and leverages multi-task learning to prompt cross-language knowledge transfer both from English and within the non-English languages. XLTime alleviates problems caused by a shortage of data in the target language. We apply XLTime with different language models and show that it outperforms the previous automatic SOTA methods on French, Spanish, Portuguese, and Basque, by large margins. XLTime also closes the gap considerably on the handcrafted HeidelTime method.

pdf bib
Scene Graph Modification as Incremental Structure Expanding
Xuming Hu | Zhijiang Guo | Yu Fu | Lijie Wen | Philip S. Yu
Proceedings of the 29th International Conference on Computational Linguistics

A scene graph is a semantic representation that expresses the objects, attributes, and relationships between objects in a scene. Scene graphs play an important role in many cross modality tasks, as they are able to capture the interactions between images and texts. In this paper, we focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query. Unlike previous approaches that rebuilt the entire scene graph, we frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE). ISE constructs the target graph by incrementally expanding the source graph without changing the unmodified structure. Based on ISE, we further propose a model that iterates between nodes prediction and edges prediction, inferring more accurate and harmonious expansion decisions progressively. In addition, we construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets. Experiments on four benchmarks demonstrate the effectiveness of our approach, which surpasses the previous state-of-the-art model by large margins.

pdf bib
Are Pre-trained Transformers Robust in Intent Classification? A Missing Ingredient in Evaluation of Out-of-Scope Intent Detection
Jianguo Zhang | Kazuma Hashimoto | Yao Wan | Zhiwei Liu | Ye Liu | Caiming Xiong | Philip Yu
Proceedings of the 4th Workshop on NLP for Conversational AI

Pre-trained Transformer-based models were reported to be robust in intent classification. In this work, we first point out the importance of in-domain out-of-scope detection in few-shot intent recognition tasks and then illustrate the vulnerability of pre-trained Transformer-based models against samples that are in-domain but out-of-scope (ID-OOS). We construct two new datasets, and empirically show that pre-trained models do not perform well on both ID-OOS examples and general out-of-scope examples, especially on fine-grained few-shot intent detection tasks.

2021

pdf bib
Enriching Non-Autoregressive Transformer with Syntactic and Semantic Structures for Neural Machine Translation
Ye Liu | Yao Wan | Jianguo Zhang | Wenting Zhao | Philip Yu
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The non-autoregressive models have boosted the efficiency of neural machine translation through parallelized decoding at the cost of effectiveness, when comparing with the autoregressive counterparts. In this paper, we claim that the syntactic and semantic structures among natural language are critical for non-autoregressive machine translation and can further improve the performance. However, these structures are rarely considered in the existing non-autoregressive models. Inspired by this intuition, we propose to incorporate the explicit syntactic and semantic structure of languages into a non-autoregressive Transformer, for the task of neural machine translation. Moreover, we also consider the intermediate latent alignment within target sentences to better learn the long-term token dependencies. Experimental results on two real-world datasets (i.e., WMT14 En-De and WMT16 En- Ro) show that our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.

pdf bib
Incremental Few-shot Text Classification with Multi-round New Classes: Formulation, Dataset and System
Congying Xia | Wenpeng Yin | Yihao Feng | Philip Yu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Text classification is usually studied by labeling natural language texts with relevant categories from a predefined set. In the real world, new classes might keep challenging the existing system with limited labeled data. The system should be intelligent enough to recognize upcoming new classes with a few examples. In this work, we define a new task in the NLP domain, incremental few-shot text classification, where the system incrementally handles multiple rounds of new classes. For each round, there is a batch of new classes with a few labeled examples per class. Two major challenges exist in this new task: (i) For the learning process, the system should incrementally learn new classes round by round without re-training on the examples of preceding classes; (ii) For the performance, the system should perform well on new classes without much loss on preceding classes. In addition to formulating the new task, we also release two benchmark datasets in the incremental few-shot setting: intent classification and relation classification. Moreover, we propose two entailment approaches, ENTAILMENT and HYBRID, which show promise for solving this novel problem.

pdf bib
HTCInfoMax: A Global Model for Hierarchical Text Classification via Information Maximization
Zhongfen Deng | Hao Peng | Dongxiao He | Jianxin Li | Philip Yu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The current state-of-the-art model HiAGM for hierarchical text classification has two limitations. First, it correlates each text sample with all labels in the dataset which contains irrelevant information. Second, it does not consider any statistical constraint on the label representations learned by the structure encoder, while constraints for representation learning are proved to be helpful in previous work. In this paper, we propose HTCInfoMax to address these issues by introducing information maximization which includes two modules: text-label mutual information maximization and label prior matching. The first module can model the interaction between each text sample and its ground truth labels explicitly which filters out irrelevant information. The second one encourages the structure encoder to learn better representations with desired characteristics for all labels which can better handle label imbalance in hierarchical text classification. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed HTCInfoMax.

pdf bib
HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization
Ye Liu | Jianguo Zhang | Yao Wan | Congying Xia | Lifang He | Philip Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

To capture the semantic graph structure from raw text, most existing summarization approaches are built on GNNs with a pre-trained model. However, these methods suffer from cumbersome procedures and inefficient computations for long-text documents. To mitigate these issues, this paper proposes HetFormer, a Transformer-based pre-trained model with multi-granularity sparse attentions for long-text extractive summarization. Specifically, we model different types of semantic nodes in raw text as a potential heterogeneous graph and directly learn heterogeneous relationships (edges) among nodes by Transformer. Extensive experiments on both single- and multi-document summarization tasks show that HetFormer achieves state-of-the-art performance in Rouge F1 while using less memory and fewer parameters.

pdf bib
Few-Shot Intent Detection via Contrastive Pre-Training and Fine-Tuning
Jianguo Zhang | Trung Bui | Seunghyun Yoon | Xiang Chen | Zhiwei Liu | Congying Xia | Quan Hung Tran | Walter Chang | Philip Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this work, we focus on a more challenging few-shot intent detection scenario where many intents are fine-grained and semantically similar. We present a simple yet effective few-shot intent detection schema via contrastive pre-training and fine-tuning. Specifically, we first conduct self-supervised contrastive pre-training on collected intent datasets, which implicitly learns to discriminate semantically similar utterances without using any labels. We then perform few-shot intent detection together with supervised contrastive learning, which explicitly pulls utterances from the same intent closer and pushes utterances across different intents farther. Experimental results show that our proposed method achieves state-of-the-art performance on three challenging intent detection datasets under 5-shot and 10-shot settings.

pdf bib
Gradient Imitation Reinforcement Learning for Low Resource Relation Extraction
Xuming Hu | Chenwei Zhang | Yawen Yang | Xiaohe Li | Li Lin | Lijie Wen | Philip S. Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Low-resource Relation Extraction (LRE) aims to extract relation facts from limited labeled corpora when human annotation is scarce. Existing works either utilize self-training scheme to generate pseudo labels that will cause the gradual drift problem, or leverage meta-learning scheme which does not solicit feedback explicitly. To alleviate selection bias due to the lack of feedback loops in existing LRE learning paradigms, we developed a Gradient Imitation Reinforcement Learning method to encourage pseudo label data to imitate the gradient descent direction on labeled data and bootstrap its optimization capability through trial and error. We also propose a framework called GradLRE, which handles two major scenarios in low-resource relation extraction. Besides the scenario where unlabeled data is sufficient, GradLRE handles the situation where no unlabeled data is available, by exploiting a contextualized augmentation method to generate data. Experimental results on two public datasets demonstrate the effectiveness of GradLRE on low resource relation extraction when comparing with baselines.

pdf bib
PDALN: Progressive Domain Adaptation over a Pre-trained Model for Low-Resource Cross-Domain Named Entity Recognition
Tao Zhang | Congying Xia | Philip S. Yu | Zhiwei Liu | Shu Zhao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Cross-domain Named Entity Recognition (NER) transfers the NER knowledge from high-resource domains to the low-resource target domain. Due to limited labeled resources and domain shift, cross-domain NER is a challenging task. To address these challenges, we propose a progressive domain adaptation Knowledge Distillation (KD) approach – PDALN. It achieves superior domain adaptability by employing three components: (1) Adaptive data augmentation techniques, which alleviate cross-domain gap and label sparsity simultaneously; (2) Multi-level Domain invariant features, derived from a multi-grained MMD (Maximum Mean Discrepancy) approach, to enable knowledge transfer across domains; (3) Advanced KD schema, which progressively enables powerful pre-trained language models to perform domain adaptation. Extensive experiments on four benchmarks show that PDALN can effectively adapt high-resource domains to low-resource target domains, even if they are diverse in terms and writing styles. Comparison with other baselines indicates the state-of-the-art performance of PDALN.

pdf bib
Dense Hierarchical Retrieval for Open-domain Question Answering
Ye Liu | Kazuma Hashimoto | Yingbo Zhou | Semih Yavuz | Caiming Xiong | Philip Yu
Findings of the Association for Computational Linguistics: EMNLP 2021

Dense neural text retrieval has achieved promising results on open-domain Question Answering (QA), where latent representations of questions and passages are exploited for maximum inner product search in the retrieval process. However, current dense retrievers require splitting documents into short passages that usually contain local, partial and sometimes biased context, and highly depend on the splitting process. As a consequence, it may yield inaccurate and misleading hidden representations, thus deteriorating the final retrieval result. In this work, we propose Dense Hierarchical Retrieval (DHR), a hierarchical framework which can generate accurate dense representations of passages by utilizing both macroscopic semantics in the document and microscopic semantics specific to each passage. Specifically, a document-level retriever first identifies relevant documents, among which relevant passages are then retrieved by a passage-level retriever. The ranking of the retrieved passages will be further calibrated by examining the document-level relevance. In addition, hierarchical title structure and two negative sampling strategies (i.e., In-Doc and In-Sec negatives) are investigated. We apply DHR to large-scale open-domain QA datasets. DHR significantly outperforms the original dense passage retriever, and helps an end-to-end QA system outperform the strong baselines on multiple open-domain QA benchmarks.

pdf bib
Semi-supervised Relation Extraction via Incremental Meta Self-Training
Xuming Hu | Chenwei Zhang | Fukun Ma | Chenyao Liu | Lijie Wen | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2021

To alleviate human efforts from obtaining large-scale annotations, Semi-Supervised Relation Extraction methods aim to leverage unlabeled data in addition to learning from limited samples. Existing self-training methods suffer from the gradual drift problem, where noisy pseudo labels on unlabeled data are incorporated during training. To alleviate the noise in pseudo labels, we propose a method called MetaSRE, where a Relation Label Generation Network generates accurate quality assessment on pseudo labels by (meta) learning from the successful and failed attempts on Relation Classification Network as an additional meta-objective. To reduce the influence of noisy pseudo labels, MetaSRE adopts a pseudo label selection and exploitation scheme which assesses pseudo label quality on unlabeled samples and only exploits high-quality pseudo labels in a self-training fashion to incrementally augment labeled samples for both robustness and accuracy. Experimental results on two public datasets demonstrate the effectiveness of the proposed approach.

pdf bib
Attend, Memorize and Generate: Towards Faithful Table-to-Text Generation in Few Shots
Wenting Zhao | Ye Liu | Yao Wan | Philip Yu
Findings of the Association for Computational Linguistics: EMNLP 2021

Few-shot table-to-text generation is a task of composing fluent and faithful sentences to convey table content using limited data. Despite many efforts having been made towards generating impressive fluent sentences by fine-tuning powerful pre-trained language models, the faithfulness of generated content still needs to be improved. To this end, this paper proposes a novel approach Attend, Memorize and Generate (called AMG), inspired by the text generation process of humans. In particular, AMG (1) attends over the multi-granularity of context using a novel strategy based on table slot level and traditional token-by-token level attention to exploit both the table structure and natural linguistic information; (2) dynamically memorizes the table slot allocation states; and (3) generates faithful sentences according to both the context and memory allocation states. Comprehensive experiments with human evaluation on three domains (i.e., humans, songs, and books) of the Wiki dataset show that our model can generate higher qualified texts when compared with several state-of-the-art baselines, in both fluency and faithfulness.

2020

pdf bib
MZET: Memory Augmented Zero-Shot Fine-grained Named Entity Typing
Tao Zhang | Congying Xia | Chun-Ta Lu | Philip Yu
Proceedings of the 28th International Conference on Computational Linguistics

Named entity typing (NET) is a classification task of assigning an entity mention in the context with given semantic types. However, with the growing size and granularity of the entity types, few previous researches concern with newly emerged entity types. In this paper, we propose MZET, a novel memory augmented FNET (Fine-grained NET) model, to tackle the unseen types in a zero-shot manner. MZET incorporates character-level, word-level, and contextural-level information to learn the entity mention representation. Besides, MZET considers the semantic meaning and the hierarchical structure into the entity type representation. Finally, through the memory component which models the relationship between the entity mention and the entity type, MZET transfers the knowledge from seen entity types to the zero-shot ones. Extensive experiments on three public datasets show the superior performance obtained by MZET, which surpasses the state-of-the-art FNET neural network models with up to 8% gain in Micro-F1 and Macro-F1 score.

pdf bib
Understanding Pre-trained BERT for Aspect-based Sentiment Analysis
Hu Xu | Lei Shu | Philip Yu | Bing Liu
Proceedings of the 28th International Conference on Computational Linguistics

This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA). Our work is motivated by the recent progress in BERT-based language models for ABSA. However, it is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important features for downstream tasks in ABSA. By leveraging the annotated datasets in ABSA, we investigate both the attentions and the learned representations of BERT pre-trained on reviews. We found that BERT uses very few self-attention heads to encode context words (such as prepositions or pronouns that indicating an aspect) and opinion words for an aspect. Most features in the representation of an aspect are dedicated to the fine-grained semantics of the domain (or product category) and the aspect itself, instead of carrying summarized opinions from its context. We hope this investigation can help future research in improving self-supervised learning, unsupervised learning and fine-tuning for ABSA. The pre-trained model and code can be found at https://github.com/howardhsu/BERT-for-RRC-ABSA.

pdf bib
Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks
Lichao Sun | Congying Xia | Wenpeng Yin | Tingting Liang | Philip Yu | Lifang He
Proceedings of the 28th International Conference on Computational Linguistics

Mixup is a latest data augmentation technique that linearly interpolates input examples and the corresponding labels. It has shown strong effectiveness in image classification by interpolating images at the pixel level. Inspired by this line of research, in this paper, we explore i) how to apply mixup to natural language processing tasks since text data can hardly be mixed in the raw format; ii) if mixup is still effective in transformer-based learning models,e.g., BERT.To achieve the goal, we incorporate mixup to transformer-based pre-trained architecture, named“mixup-transformer”, for a wide range of NLP tasks while keeping the whole end-to-end training system. We evaluate the proposed framework by running extensive experiments on the GLUEbenchmark. Furthermore, we also examine the performance of mixup-transformer in low-resource scenarios by reducing the training data with a certain ratio. Our studies show that mixup is a domain-independent data augmentation technique to pre-trained language models, resulting in significant performance improvement for transformer-based models.

pdf bib
User Memory Reasoning for Conversational Recommendation
Hu Xu | Seungwhan Moon | Honglei Liu | Bing Liu | Pararth Shah | Bing Liu | Philip Yu
Proceedings of the 28th International Conference on Computational Linguistics

We study an end-to-end approach for conversational recommendation that dynamically manages and reasons over users’ past (offline) preferences and current (online) requests through a structured and cumulative user memory knowledge graph. This formulation extends existing state tracking beyond the boundary of a single dialog to user state tracking (UST). For this study, we create a new Memory Graph (MG) <-> Conversational Recommendation parallel corpus called MGConvRex with 7K+ human-to-human role-playing dialogs, grounded on a large-scale user memory bootstrapped from real-world user scenarios. MGConvRex captures human-level reasoning over user memory and has disjoint training/testing sets of users for zero-shot (cold-start) reasoning for recommendation. We propose a simple yet expandable formulation for constructing and updating the MG, and an end-to-end graph-based reasoning model that updates MG from unstructured utterances and predicts optimal dialog policies (eg recommendation) based on updated MG. The prediction of our proposed model inherits the graph structure, providing a natural way to explain policies. Experiments are conducted for both offline metrics and online simulation, showing competitive results.

pdf bib
Hierarchical Bi-Directional Self-Attention Networks for Paper Review Rating Recommendation
Zhongfen Deng | Hao Peng | Congying Xia | Jianxin Li | Lifang He | Philip Yu
Proceedings of the 28th International Conference on Computational Linguistics

Review rating prediction of text reviews is a rapidly growing technology with a wide range of applications in natural language processing. However, most existing methods either use hand-crafted features or learn features using deep learning with simple text corpus as input for review rating prediction, ignoring the hierarchies among data. In this paper, we propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation, which can serve as an effective decision-making tool for the academic paper review process. Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three). Each encoder first derives contextual representation of each level, then generates a higher-level representation, and after the learning process, we are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers. Furthermore, we introduce two new metrics to evaluate models in data imbalance situations. Extensive experiments on a publicly available dataset (PeerRead) and our own collected dataset (OpenReview) demonstrate the superiority of the proposed approach compared with state-of-the-art methods.

pdf bib
SelfORE: Self-supervised Relational Feature Learning for Open Relation Extraction
Xuming Hu | Lijie Wen | Yusong Xu | Chenwei Zhang | Philip Yu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Open relation extraction is the task of extracting open-domain relation facts from natural language sentences. Existing works either utilize heuristics or distant-supervised annotations to train a supervised classifier over pre-defined relations, or adopt unsupervised methods with additional assumptions that have less discriminative power. In this work, we propose a self-supervised framework named SelfORE, which exploits weak, self-supervised signals by leveraging large pretrained language model for adaptive clustering on contextualized relational features, and bootstraps the self-supervised signals by improving contextualized features in relation classification. Experimental results on three datasets show the effectiveness and robustness of SelfORE on open-domain Relation Extraction when comparing with competitive baselines.

pdf bib
Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference
Jianguo Zhang | Kazuma Hashimoto | Wenhao Liu | Chien-Sheng Wu | Yao Wan | Philip Yu | Richard Socher | Caiming Xiong
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Intent detection is one of the core components of goal-oriented dialog systems, and detecting out-of-scope (OOS) intents is also a practically important skill. Few-shot learning is attracting much attention to mitigate data scarcity, but OOS detection becomes even more challenging. In this paper, we present a simple yet effective approach, discriminative nearest neighbor classification with deep self-attention. Unlike softmax classifiers, we leverage BERT-style pairwise encoding to train a binary classifier that estimates the best matched training example for a user input. We propose to boost the discriminative ability by transferring a natural language inference (NLI) model. Our extensive experiments on a large-scale multi-domain intent detection task show that our method achieves more stable and accurate in-domain and OOS detection accuracy than RoBERTa-based classifiers and embedding-based nearest neighbor approaches. More notably, the NLI transfer enables our 10-shot model to perform competitively with 50-shot or even full-shot classifiers, while we can keep the inference time constant by leveraging a faster embedding retrieval model.

pdf bib
Improving Medical NLI Using Context-Aware Domain Knowledge
Shaika Chowdhury | Philip Yu | Yuan Luo
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

Domain knowledge is important to understand both the lexical and relational associations of words in natural language text, especially for domain-specific tasks like Natural Language Inference (NLI) in the medical domain, where due to the lack of a large annotated dataset such knowledge cannot be implicitly learned during training. However, because of the linguistic idiosyncrasies of clinical texts (e.g., shorthand jargon), solely relying on domain knowledge from an external knowledge base (e.g., UMLS) can lead to wrong inference predictions as it disregards contextual information and, hence, does not return the most relevant mapping. To remedy this, we devise a knowledge adaptive approach for medical NLI that encodes the premise/hypothesis texts by leveraging supplementary external knowledge, alongside the UMLS, based on the word contexts. By incorporating refined domain knowledge at both the lexical and relational levels through a multi-source attention mechanism, it is able to align the token-level interactions between the premise and hypothesis more effectively. Comprehensive experiments and case study on the recently released MedNLI dataset are conducted to validate the effectiveness of the proposed approach.

pdf bib
Find or Classify? Dual Strategy for Slot-Value Predictions on Multi-Domain Dialog State Tracking
Jianguo Zhang | Kazuma Hashimoto | Chien-Sheng Wu | Yao Wang | Philip Yu | Richard Socher | Caiming Xiong
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

Dialog state tracking (DST) is a core component in task-oriented dialog systems. Existing approaches for DST mainly fall into one of two categories, namely, ontology-based and ontology-free methods. An ontology-based method selects a value from a candidate-value list for each target slot, while an ontology-free method extracts spans from dialog contexts. Recent work introduced a BERT-based model to strike a balance between the two methods by pre-defining categorical and non-categorical slots. However, it is not clear enough which slots are better handled by either of the two slot types, and the way to use the pre-trained model has not been well investigated. In this paper, we propose a simple yet effective dual-strategy model for DST, by adapting a single BERT-style reading comprehension model to jointly handle both the categorical and non-categorical slots. Our experiments on the MultiWOZ datasets show that our method significantly outperforms the BERT-based counterpart, finding that the key is a deep interaction between the domain-slot and context information. When evaluated on noisy (MultiWOZ 2.0) and cleaner (MultiWOZ 2.1) settings, our method performs competitively and robustly across the two different settings. Our method sets the new state of the art in the noisy setting, while performing more robustly than the best model in the cleaner setting. We also conduct a comprehensive error analysis on the dataset, including the effects of the dual strategy for each slot, to facilitate future research.

pdf bib
Commonsense Evidence Generation and Injection in Reading Comprehension
Ye Liu | Tao Yang | Zeyu You | Wei Fan | Philip S. Yu
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Human tackle reading comprehension not only based on the given context itself but often rely on the commonsense beyond. To empower the machine with commonsense reasoning, in this paper, we propose a Commonsense Evidence Generation and Injection framework in reading comprehension, named CEGI. The framework injects two kinds of auxiliary commonsense evidence into comprehensive reading to equip the machine with the ability of rational thinking. Specifically, we build two evidence generators: one aims to generate textual evidence via a language model; the other aims to extract factual evidence (automatically aligned text-triples) from a commonsense knowledge graph after graph completion. Those evidences incorporate contextual commonsense and serve as the additional inputs to the reasoning model. Thereafter, we propose a deep contextual encoder to extract semantic relationships among the paragraph, question, option, and evidence. Finally, we employ a capsule network to extract different linguistic units (word and phrase) from the relations, and dynamically predict the optimal option based on the extracted units. Experiments on the CosmosQA dataset demonstrate that the proposed CEGI model outperforms the current state-of-the-art approaches and achieves the highest accuracy (83.6%) on the leaderboard.

pdf bib
Dynamic Semantic Matching and Aggregation Network for Few-shot Intent Detection
Hoang Nguyen | Chenwei Zhang | Congying Xia | Philip Yu
Findings of the Association for Computational Linguistics: EMNLP 2020

Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances. Although recent works demonstrate that multi-level matching plays an important role in transferring learned knowledge from seen training classes to novel testing classes, they rely on a static similarity measure and overly fine-grained matching components. These limitations inhibit generalizing capability towards Generalized Few-shot Learning settings where both seen and novel classes are co-existent. In this paper, we propose a novel Semantic Matching and Aggregation Network where semantic components are distilled from utterances via multi-head self-attention with additional dynamic regularization constraints. These semantic components capture high-level information, resulting in more effective matching between instances. Our multi-perspective matching method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances. We also propose a more challenging evaluation setting that considers classification on the joint all-class label space. Extensive experimental results demonstrate the effectiveness of our method. Our code and data are publicly available.

pdf bib
DomBERT: Domain-oriented Language Model for Aspect-based Sentiment Analysis
Hu Xu | Bing Liu | Lei Shu | Philip Yu
Findings of the Association for Computational Linguistics: EMNLP 2020

This paper focuses on learning domain-oriented language models driven by end tasks, which aims to combine the worlds of both general-purpose language models (such as ELMo and BERT) and domain-specific language understanding. We propose DomBERT, an extension of BERT to learn from both in-domain corpus and relevant domain corpora. This helps in learning domain language models with low-resources. Experiments are conducted on an assortment of tasks in aspect-based sentiment analysis (ABSA), demonstrating promising results.

pdf bib
Composed Variational Natural Language Generation for Few-shot Intents
Congying Xia | Caiming Xiong | Philip Yu | Richard Socher
Findings of the Association for Computational Linguistics: EMNLP 2020

In this paper, we focus on generating training examples for few-shot intents in the realistic imbalanced scenario. To build connections between existing many-shot intents and few-shot intents, we consider an intent as a combination of a domain and an action, and propose a composed variational natural language generator (CLANG), a transformer-based conditional variational autoencoder. CLANG utilizes two latent variables to represent the utterances corresponding to two different independent parts (domain and action) in the intent, and the latent variables are composed together to generate natural examples. Additionally, to improve the generator learning, we adopt the contrastive regularization loss that contrasts the in-class with the out-of-class utterance generation given the intent. To evaluate the quality of the generated utterances, experiments are conducted on the generalized few-shot intent detection task. Empirical results show that our proposed model achieves state-of-the-art performances on two real-world intent detection datasets.

2019

pdf bib
Multi-grained Named Entity Recognition
Congying Xia | Chenwei Zhang | Tao Yang | Yaliang Li | Nan Du | Xian Wu | Wei Fan | Fenglong Ma | Philip Yu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper presents a novel framework, MGNER, for Multi-Grained Named Entity Recognition where multiple entities or entity mentions in a sentence could be non-overlapping or totally nested. Different from traditional approaches regarding NER as a sequential labeling task and annotate entities consecutively, MGNER detects and recognizes entities on multiple granularities: it is able to recognize named entities without explicitly assuming non-overlapping or totally nested structures. MGNER consists of a Detector that examines all possible word segments and a Classifier that categorizes entities. In addition, contextual information and a self-attention mechanism are utilized throughout the framework to improve the NER performance. Experimental results show that MGNER outperforms current state-of-the-art baselines up to 4.4% in terms of the F1 score among nested/non-overlapping NER tasks.

pdf bib
Joint Slot Filling and Intent Detection via Capsule Neural Networks
Chenwei Zhang | Yaliang Li | Nan Du | Wei Fan | Philip Yu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Being able to recognize words as slots and detect the intent of an utterance has been a keen issue in natural language understanding. The existing works either treat slot filling and intent detection separately in a pipeline manner, or adopt joint models which sequentially label slots while summarizing the utterance-level intent without explicitly preserving the hierarchical relationship among words, slots, and intents. To exploit the semantic hierarchy for effective modeling, we propose a capsule-based neural network model which accomplishes slot filling and intent detection via a dynamic routing-by-agreement schema. A re-routing schema is proposed to further synergize the slot filling performance using the inferred intent representation. Experiments on two real-world datasets show the effectiveness of our model when compared with other alternative model architectures, as well as existing natural language understanding services.

pdf bib
BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis
Hu Xu | Bing Liu | Lei Shu | Philip Yu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Question-answering plays an important role in e-commerce as it allows potential customers to actively seek crucial information about products or services to help their purchase decision making. Inspired by the recent success of machine reading comprehension (MRC) on formal documents, this paper explores the potential of turning customer reviews into a large source of knowledge that can be exploited to answer user questions. We call this problem Review Reading Comprehension (RRC). To the best of our knowledge, no existing work has been done on RRC. In this work, we first build an RRC dataset called ReviewRC based on a popular benchmark for aspect-based sentiment analysis. Since ReviewRC has limited training examples for RRC (and also for aspect-based sentiment analysis), we then explore a novel post-training approach on the popular language model BERT to enhance the performance of fine-tuning of BERT for RRC. To show the generality of the approach, the proposed post-training is also applied to some other review-based tasks such as aspect extraction and aspect sentiment classification in aspect-based sentiment analysis. Experimental results demonstrate that the proposed post-training is highly effective.

pdf bib
Multi-Modal Generative Adversarial Network for Short Product Title Generation in Mobile E-Commerce
Jianguo Zhang | Pengcheng Zou | Zhao Li | Yao Wan | Xiuming Pan | Yu Gong | Philip S. Yu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Nowadays, more and more customers browse and purchase products in favor of using mobile E-Commerce Apps such as Taobao and Amazon. Since merchants are usually inclined to describe redundant and over-informative product titles to attract attentions from customers, it is important to concisely display short product titles on limited screen of mobile phones. To address this discrepancy, previous studies mainly consider textual information of long product titles and lacks of human-like view during training and evaluation process. In this paper, we propose a Multi-Modal Generative Adversarial Network (MM-GAN) for short product title generation in E-Commerce, which innovatively incorporates image information and attribute tags from product, as well as textual information from original long titles. MM-GAN poses short title generation as a reinforcement learning process, where the generated titles are evaluated by the discriminator in a human-like view. Extensive experiments on a large-scale E-Commerce dataset demonstrate that our algorithm outperforms other state-of-the-art methods. Moreover, we deploy our model into a real-world online E-Commerce environment and effectively boost the performance of click through rate and click conversion rate by 1.66% and 1.87%, respectively.

2018

pdf bib
Zero-shot User Intent Detection via Capsule Neural Networks
Congying Xia | Chenwei Zhang | Xiaohui Yan | Yi Chang | Philip Yu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

User intent detection plays a critical role in question-answering and dialog systems. Most previous works treat intent detection as a classification problem where utterances are labeled with predefined intents. However, it is labor-intensive and time-consuming to label users’ utterances as intents are diversely expressed and novel intents will continually be involved. Instead, we study the zero-shot intent detection problem, which aims to detect emerging user intents where no labeled utterances are currently available. We propose two capsule-based architectures: IntentCapsNet that extracts semantic features from utterances and aggregates them to discriminate existing intents, and IntentCapsNet-ZSL which gives IntentCapsNet the zero-shot learning ability to discriminate emerging intents via knowledge transfer from existing intents. Experiments on two real-world datasets show that our model not only can better discriminate diversely expressed existing intents, but is also able to discriminate emerging intents when no labeled utterances are available.

pdf bib
Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction
Hu Xu | Bing Liu | Lei Shu | Philip S. Yu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

One key task of fine-grained sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using deep learning. Unlike other highly sophisticated supervised deep learning models, this paper proposes a novel and yet simple CNN model employing two types of pre-trained embeddings for aspect extraction: general-purpose embeddings and domain-specific embeddings. Without using any additional supervision, this model achieves surprisingly good results, outperforming state-of-the-art sophisticated existing methods. To our knowledge, this paper is the first to report such double embeddings based CNN model for aspect extraction and achieve very good results.