Bing Liu


2024

pdf bib
Proceedings of the 6th Workshop on NLP for Conversational AI (NLP4ConvAI 2024)
Elnaz Nouri | Abhinav Rastogi | Georgios Spithourakis | Bing Liu | Yun-Nung Chen | Yu Li | Alon Albalak | Hiromi Wakaki | Alexandros Papangelis
Proceedings of the 6th Workshop on NLP for Conversational AI (NLP4ConvAI 2024)

pdf bib
Noisy Multi-Label Text Classification via Instance-Label Pair Correction
Pengyu Xu | Mingyang Song | Linkaida Liu | Bing Liu | Hongjian Sun | Liping Jing | Jian Yu
Findings of the Association for Computational Linguistics: NAACL 2024

In noisy label learning, instance selection based on small-loss criteria has been proven to be highly effective. However, in the case of noisy multi-label text classification (NMLTC), the presence of noise is not limited to the instance-level but extends to the (instance-label) pair-level.This gives rise to two main challenges.(1) The loss information at the pair-level fails to capture the variations between instances. (2) There are two types of noise at the pair-level: false positives and false negatives. Identifying false negatives from a large pool of negative pairs presents an exceedingly difficult task. To tackle these issues, we propose a novel approach called instance-label pair correction (iLaCo), which aims to address the problem of noisy pair selection and correction in NMLTC tasks.Specifically, we first introduce a holistic selection metric that identifies noisy pairs by simultaneously considering global loss information and instance-specific ranking information.Secondly, we employ a filter guided by label correlation to focus exclusively on negative pairs with label relevance. This filter significantly reduces the difficulty of identifying false negatives.Experimental analysis indicates that our framework effectively corrects noisy pairs in NMLTC datasets, leading to a significant improvement in model performance.

pdf bib
Sentiment Analysis in the Era of Large Language Models: A Reality Check
Wenxuan Zhang | Yue Deng | Bing Liu | Sinno Pan | Lidong Bing
Findings of the Association for Computational Linguistics: NAACL 2024

Sentiment analysis (SA) has been a long-standing research area in natural language processing. With the recent advent of large language models (LLMs), there is great potential for their employment on SA problems. However, the extent to which current LLMs can be leveraged for different sentiment analysis tasks remains unclear. This paper aims to provide a comprehensive investigation into the capabilities of LLMs in performing various sentiment analysis tasks, from conventional sentiment classification to aspect-based sentiment analysis and multifaceted analysis of subjective texts. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets. Our study reveals that while LLMs demonstrate satisfactory performance in simpler tasks, they lag behind in more complex tasks requiring a deeper understanding of specific sentiment phenomena or structured sentiment information. However, LLMs significantly outperform SLMs in few-shot learning settings, suggesting their potential when annotation resources are limited. We also highlight the limitations of current evaluation practices in assessing LLMs’ SA abilities and propose a novel benchmark, SentiEval, for a more comprehensive and realistic evaluation. Data and code are available at https://github.com/DAMO-NLP-SG/LLM-Sentiment.

pdf bib
An Evaluation Mechanism of LLM-based Agents on Manipulating APIs
Bing Liu | Zhou Jianxiang | Dan Meng | Haonan Lu
Findings of the Association for Computational Linguistics: EMNLP 2024

LLM-based agents can greatly extend the abilities of LLMs and thus attract sharply increased studies. An ambitious vision – serving users by manipulating massive API-based tools – has been proposed and explored. However, we find a widely accepted evaluation mechanism for generic agents is still missing. This work aims to fill this gap. We decompose tool use capability into seven aspects and form a thorough evaluation schema. In addition, we design and release an instruction dataset and a toolset – the two sides that the agents bridge between – following the principle of reflecting real-world challenges. Furthermore, we evaluate multiple generic agents. Our findings can inspire future research in improving LLM-based agents and rethink the philosophy of API design.

pdf bib
Probing Language Models for Pre-training Data Detection
Zhenhua Liu | Tong Zhu | Chuanyuan Tan | Bing Liu | Haonan Lu | Wenliang Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase. Therefore, it is vital to detect the contamination by checking whether an LLM has been pre-trained on the target texts. Recent studies focus on the generated texts and compute perplexities, which are superficial features and not reliable. In this study, we propose to utilize the probing technique for pre-training data detection by examining the model’s internal activations. Our method is simple and effective and leads to more trustworthy pre-training data detection. Additionally, we propose ArxivMIA, a new challenging benchmark comprising arxiv abstracts from Computer Science and Mathematics categories. Our experiments demonstrate that our method outperforms all baselines, and achieves state-of-the-art performance on both WikiMIA and ArxivMIA, with additional experiments confirming its efficacy.

pdf bib
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Haoqiu Yan | Yongxin Zhu | Kai Zheng | Bing Liu | Haoyu Cao | Deqiang Jiang | Linli Xu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers’ intentions, resulting in inconsistent or even contradictory responses within dialogues. To bridge this gap, in this paper, we propose PerceptiveAgent, an empathetic multi-modal dialogue system designed to discern deeper or more subtle meanings beyond the literal interpretations of words through the integration of speech modality perception. Employing LLMs as a cognitive core, PerceptiveAgent perceives acoustic information from input speech and generates empathetic responses based on speaking styles described in natural language. Experimental results indicate that PerceptiveAgent excels in contextual understanding by accurately discerning the speakers’ true intentions in scenarios where the linguistic meaning is either contrary to or inconsistent with the speaker’s true feelings, producing more nuanced and expressive spoken dialogues. Code is publicly available at: https://github.com/Haoqiu-Yan/PerceptiveAgent.

pdf bib
Modeling Low-Resource Health Coaching Dialogues via Neuro-Symbolic Goal Summarization and Text-Units-Text Generation
Yue Zhou | Barbara Di Eugenio | Brian Ziebart | Lisa Sharp | Bing Liu | Nikolaos Agadakos
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Health coaching helps patients achieve personalized and lifestyle-related goals, effectively managing chronic conditions and alleviating mental health issues. It is particularly beneficial, however cost-prohibitive, for low-socioeconomic status populations due to its highly personalized and labor-intensive nature. In this paper, we propose a neuro-symbolic goal summarizer to support health coaches in keeping track of the goals and a text-units-text dialogue generation model that converses with patients and helps them create and accomplish specific goals for physical activities. Our models outperform previous state-of-the-art while eliminating the need for predefined schema and corresponding annotation. We also propose a new health coaching dataset extending previous work and a metric to measure the unconventionality of the patient’s response based on data difficulty, facilitating potential coach alerts during deployment.

2023

pdf bib
Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast
Yiduo Guo | Yaobo Liang | Dongyan Zhao | Bing Liu | Nan Duan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Existing research has shown that a multilingual pre-trained language model fine-tuned with one (source) language also performs well on downstream tasks for non-source languages, even though no fine-tuning is done on these languages. However, there is a clear gap between the performance of the source language and that of the non-source languages. This paper analyzes the fine-tuning process, discovers when the performance gap changes and identifies which network weights affect the overall performance most. Additionally, the paper seeks to answer to what extent the gap can be reduced by reducing forgetting. Based on the analysis results, a method named Fine-tuning slow and fast with four training policies is proposed to address these issues. Experimental results show the proposed method outperforms baselines by a clear margin.

pdf bib
Introducing Semantics into Speech Encoders
Derek Xu | Shuyan Dong | Changhan Wang | Suyoun Kim | Zhaojiang Lin | Bing Liu | Akshat Shrivastava | Shang-Wen Li | Liang-Hsuan Tseng | Guan-Ting Lin | Alexei Baevski | Hung-yi Lee | Yizhou Sun | Wei Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio transcriptions, which is expensive and time-consuming to obtain. We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions. By introducing semantics, we improve existing speech encoder spoken language understanding (SLU) performance by over 5% on intent classification (IC), with modest gains in named entity resolution (NER) and slot filling (SF), and spoken question answering (SQA) FF1 score by over 2%. Our approach, which uses no ASR data, achieves similar performance as methods trained on over 100 hours of labeled audio transcripts, demonstrating the feasibility of unsupervised semantic augmentations to existing speech encoders.

pdf bib
Class-Incremental Learning based on Label Generation
Yijia Shao | Yiduo Guo | Dongyan Zhao | Bing Liu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Despite the great success of pre-trained language models, it is still a challenge to use these models for continual learning, especially for the class-incremental learning (CIL) setting due to catastrophic forgetting (CF). This paper reports our finding that if we formulate CIL as a continual label generation problem, CF is drastically reduced and the generalizable representations of pre-trained models can be better retained. We thus propose a new CIL method (VAG) that also leverages the sparsity of vocabulary to focus the generation and creates pseudo-replay samples by using label semantics. Experimental results show that VAG outperforms baselines by a large margin.

pdf bib
A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution
Neeraj Varshney | Himanshu Gupta | Eric Robertson | Bing Liu | Chitta Baral
Findings of the Association for Computational Linguistics: ACL 2023

State-of-the-art natural language processing models have been shown to achieve remarkable performance in ‘closed-world’ settings where all the labels in the evaluation set are known at training time. However, in real-world settings, ‘novel’ instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research in this important area of ‘dealing with novelties’, we introduce NoveltyTask, a multi-stage task to evaluate a system’s performance on pipelined novelty ‘detection’ and ‘accommodation’ tasks. We provide mathematical formulation of NoveltyTask and instantiate it with the authorship attribution task that pertains to identifying the correct author of a given text. We use amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask. We conduct comprehensive experiments and explore several baseline methods for the task. Our results show that the methods achieve considerably low performance making the task challenging and leaving sufficient room for improvement. Finally, we believe our work will encourage research in this underexplored area of dealing with novelties, an important step en route to developing robust systems.

pdf bib
Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
Shengyao Zhuang | Bing Liu | Bevan Koopman | Guido Zuccon
Findings of the Association for Computational Linguistics: EMNLP 2023

In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document. Recently, advanced large language models (LLMs) have emerged as effective QLMs, showcasing promising ranking capabilities. This paper focuses on investigating the genuine zero-shot ranking effectiveness of recent LLMs, which are solely pre-trained on unstructured text data without supervised instruction fine-tuning. Our findings reveal the robust zero-shot ranking ability of such LLMs, highlighting that additional instruction fine-tuning may hinder effectiveness unless a question generation task is present in the fine-tuning dataset. Furthermore, we introduce a novel state-of-the-art ranking system that integrates LLM-based QLMs with a hybrid zero-shot retriever, demonstrating exceptional effectiveness in both zero-shot and few-shot scenarios. We make our codebase publicly available at https://github.com/ielab/llm-qlm.

pdf bib
Sub-network Discovery and Soft-masking for Continual Learning of Mixed Tasks
Zixuan Ke | Bing Liu | Wenhan Xiong | Asli Celikyilmaz | Haoran Li
Findings of the Association for Computational Linguistics: EMNLP 2023

Continual learning (CL) has two main objectives: preventing catastrophic forgetting (CF) and encouraging knowledge transfer (KT). The existing literature mainly focused on overcoming CF. Some work has also been done on KT when the tasks are similar. To our knowledge, only one method has been proposed to learn a sequence of mixed tasks. However, these techniques still suffer from CF and/or limited KT. This paper proposes a new CL method to achieve both. It overcomes CF by isolating the knowledge of each task via discovering a sub-network for it. A soft-masking mechanism is also proposed to preserve the previous knowledge and to enable the new task to leverage the past knowledge to achieve KT. Experiments using classification, generation, information extraction, and their mixture (i.e., heterogeneous tasks) show that the proposed method consistently outperforms strong baselines.

2022

pdf bib
Guiding Neural Entity Alignment with Compatibility
Bing Liu | Harrisen Scells | Wen Hua | Guido Zuccon | Genghong Zhao | Xia Zhang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Entity Alignment (EA) aims to find equivalent entities between two Knowledge Graphs (KGs). While numerous neural EA models have been devised, they are mainly learned using labelled data only. In this work, we argue that different entities within one KG should have compatible counterparts in the other KG due to the potential dependencies among the entities. Making compatible predictions thus should be one of the goals of training an EA model along with fitting the labelled data: this aspect however is neglected in current methods. To power neural EA models with compatibility, we devise a training framework by addressing three problems: (1) how to measure the compatibility of an EA model; (2) how to inject the property of being compatible into an EA model; (3) how to optimise parameters of the compatibility model. Extensive experiments on widely-used datasets demonstrate the advantages of integrating compatibility within EA models. In fact, state-of-the-art neural EA models trained within our framework using just 5% of the labelled data can achieve comparable effectiveness with supervised training using 20% of the labelled data.

pdf bib
Semantic Novelty Detection and Characterization in Factual Text Involving Named Entities
Nianzu Ma | Sahisnu Mazumder | Alexander Politowicz | Bing Liu | Eric Robertson | Scott Grigsby
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Much of the existing work on text novelty detection has been studied at the topic level, i.e., identifying whether the topic of a document or a sentence is novel or not. Little work has been done at the fine-grained semantic level (or contextual level). For example, given that we know Elon Musk is the CEO of a technology company, the sentence “Elon Musk acted in the sitcom The Big Bang Theory” is novel and surprising because normally a CEO would not be an actor. Existing topic-based novelty detection methods work poorly on this problem because they do not perform semantic reasoning involving relations between named entities in the text and their background knowledge. This paper proposes an effective model (called PAT-SND) to solve the problem, which can also characterize the novelty. An annotated dataset is also created. Evaluation shows that PAT-SND outperforms 10 baselines by large margins.

pdf bib
Adapting a Language Model While Preserving its General Knowledge
Zixuan Ke | Yijia Shao | Haowei Lin | Hu Xu | Lei Shu | Bing Liu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Domain-adaptive pre-training (or DA-training for short), also known as post-training, aimsto train a pre-trained general-purpose language model (LM) using an unlabeled corpus of aparticular domain to adapt the LM so that end-tasks in the domain can give improved performances. However, existing DA-training methods are in some sense blind as they do not explicitly identify what knowledge in the LM should be preserved and what should be changed by the domain corpus. This paper shows that the existing methods are suboptimal and proposes a novel method to perform a more informed adaptation of the knowledge in the LM by (1) soft-masking the attention heads based on their importance to best preserve the general knowledge in the LM and (2) contrasting the representations of the general and the full (both general and domain knowledge) to learn an integrated representation with both general and domain-specific knowledge. Experimental results will demonstrate the effectiveness of the proposed approach.

pdf bib
Continual Training of Language Models for Few-Shot Learning
Zixuan Ke | Haowei Lin | Yijia Shao | Hu Xu | Lei Shu | Bing Liu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowledge without forgetting its previous skills. The goal is to improve the few-shot end-task learning in these domains. The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system. Experimental results verify its effectiveness.

pdf bib
KETOD: Knowledge-Enriched Task-Oriented Dialogue
Zhiyu Chen | Bing Liu | Seungwhan Moon | Chinnadhurai Sankar | Paul Crook | William Yang Wang
Findings of the Association for Computational Linguistics: NAACL 2022

Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains. Towards building a human-like assistant that can converse naturally and seamlessly with users, it is important to build a dialogue system that conducts both types of conversations effectively. In this work, we investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model. To this end, we create a new dataset, KETOD (Knowledge-Enriched Task-Oriented Dialogue), where we naturally enrich task-oriented dialogues with chit-chat based on relevant entity knowledge. We also propose two new models, SimpleToDPlus and Combiner, for the proposed task. Experimental results on both automatic and human evaluations show that the proposed methods can significantly improve the performance in knowledge-enriched response generation while maintaining a competitive task-oriented dialog performance. We believe our new dataset will be a valuable resource for future studies. Our dataset and code are publicly available at https://github.com/facebookresearch/ketod.

pdf bib
Proceedings of the 4th Workshop on NLP for Conversational AI
Bing Liu | Alexandros Papangelis | Stefan Ultes | Abhinav Rastogi | Yun-Nung Chen | Georgios Spithourakis | Elnaz Nouri | Weiyan Shi
Proceedings of the 4th Workshop on NLP for Conversational AI

pdf bib
Towards Enhancing Health Coaching Dialogue in Low-Resource Settings
Yue Zhou | Barbara Di Eugenio | Brian Ziebart | Lisa Sharp | Bing Liu | Ben Gerber | Nikolaos Agadakos | Shweta Yadav
Proceedings of the 29th International Conference on Computational Linguistics

Health coaching helps patients identify and accomplish lifestyle-related goals, effectively improving the control of chronic diseases and mitigating mental health conditions. However, health coaching is cost-prohibitive due to its highly personalized and labor-intensive nature. In this paper, we propose to build a dialogue system that converses with the patients, helps them create and accomplish specific goals, and can address their emotions with empathy. However, building such a system is challenging since real-world health coaching datasets are limited and empathy is subtle. Thus, we propose a modularized health coaching dialogue with simplified NLU and NLG frameworks combined with mechanism-conditioned empathetic response generation. Through automatic and human evaluation, we show that our system generates more empathetic, fluent, and coherent responses and outperforms the state-of-the-art in NLU tasks while requiring less annotation. We view our approach as a key step towards building automated and more accessible health coaching systems.

2021

pdf bib
Concept-Based Label Embedding via Dynamic Routing for Hierarchical Text Classification
Xuepeng Wang | Li Zhao | Bing Liu | Tao Chen | Feng Zhang | Di Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Hierarchical Text Classification (HTC) is a challenging task that categorizes a textual description within a taxonomic hierarchy. Most of the existing methods focus on modeling the text. Recently, researchers attempt to model the class representations with some resources (e.g., external dictionaries). However, the concept shared among classes which is a kind of domain-specific and fine-grained information has been ignored in previous work. In this paper, we propose a novel concept-based label embedding method that can explicitly represent the concept and model the sharing mechanism among classes for the hierarchical text classification. Experimental results on two widely used datasets prove that the proposed model outperforms several state-of-the-art methods. We release our complementary resources (concepts and definitions of classes) for these two datasets to benefit the research on HTC.

pdf bib
Adding Chit-Chat to Enhance Task-Oriented Dialogues
Kai Sun | Seungwhan Moon | Paul Crook | Stephen Roller | Becka Silvert | Bing Liu | Zhiguang Wang | Honglei Liu | Eunjoon Cho | Claire Cardie
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e.g., booking hotels), open-domain chatbots aim at making socially engaging conversations. In this work, we propose to integrate both types of systems by Adding Chit-Chat to ENhance Task-ORiented dialogues (ACCENTOR), with the goal of making virtual assistant conversations more engaging and interactive. Specifically, we propose a Human <-> AI collaborative data collection approach for generating diverse chit-chat responses to augment task-oriented dialogues with minimal annotation effort. We then present our new chit-chat-based annotations to 23.8K dialogues from two popular task-oriented datasets (Schema-Guided Dialogue and MultiWOZ 2.1) and demonstrate their advantage over the originals via human evaluation. Lastly, we propose three new models for adding chit-chat to task-oriented dialogues, explicitly trained to predict user goals and to generate contextually relevant chit-chat responses. Automatic and human evaluations show that, compared with the state-of-the-art task-oriented baseline, our models can code-switch between task and chit-chat to be more engaging, interesting, knowledgeable, and humanlike, while maintaining competitive task performance.

pdf bib
Adapting BERT for Continual Learning of a Sequence of Aspect Sentiment Classification Tasks
Zixuan Ke | Hu Xu | Bing Liu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

This paper studies continual learning (CL) of a sequence of aspect sentiment classification (ASC) tasks. Although some CL techniques have been proposed for document sentiment classification, we are not aware of any CL work on ASC. A CL system that incrementally learns a sequence of ASC tasks should address the following two issues: (1) transfer knowledge learned from previous tasks to the new task to help it learn a better model, and (2) maintain the performance of the models for previous tasks so that they are not forgotten. This paper proposes a novel capsule network based model called B-CL to address these issues. B-CL markedly improves the ASC performance on both the new task and the old tasks via forward and backward knowledge transfer. The effectiveness of B-CL is demonstrated through extensive experiments.

pdf bib
Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue StateTracking
Zhaojiang Lin | Bing Liu | Seungwhan Moon | Paul Crook | Zhenpeng Zhou | Zhiguang Wang | Zhou Yu | Andrea Madotto | Eunjoon Cho | Rajen Subba
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Zero-shot cross-domain dialogue state tracking (DST) enables us to handle unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot descriptions enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes a dialogue context and a slot with a pre-trained self-attentive encoder, and generates slot value in auto-regressive manner. In addition, we incorporate Slot Type Informed Descriptions that capture the shared information of different slots to facilitates the cross-domain knowledge transfer. Experimental results on MultiWOZ shows that our model significantly improve existing state-of-the-art results in zero-shot cross-domain setting.

pdf bib
Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models
Tianxing He | Jun Liu | Kyunghyun Cho | Myle Ott | Bing Liu | James Glass | Fuchun Peng
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In this work, we study how the finetuning stage in the pretrain-finetune framework changes the behavior of a pretrained neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. Our major finding is that after standard finetuning, the model forgets some of the important language generation skills acquired during large-scale pretraining. We demonstrate the forgetting phenomenon through a set of detailed behavior analysis from the perspectives of knowledge transfer, context sensitivity, and function space projection. As a preliminary attempt to alleviate the forgetting problem, we propose an intuitive finetuning strategy named “mix-review”. We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.

pdf bib
Summarizing Behavioral Change Goals from SMS Exchanges to Support Health Coaches
Itika Gupta | Barbara Di Eugenio | Brian D. Ziebart | Bing Liu | Ben S. Gerber | Lisa K. Sharp
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Regular physical activity is associated with a reduced risk of chronic diseases such as type 2 diabetes and improved mental well-being. Yet, more than half of the US population is insufficiently active. Health coaching has been successful in promoting healthy behaviors. In this paper, we present our work towards assisting health coaches by extracting the physical activity goal the user and coach negotiate via text messages. We show that information captured by dialogue acts can help to improve the goal extraction results. We employ both traditional and transformer-based machine learning models for dialogue acts prediction and find them statistically indistinguishable in performance on our health coaching dataset. Moreover, we discuss the feedback provided by the health coaches when evaluating the correctness of the extracted goal summaries. This work is a step towards building a virtual assistant health coach to promote a healthy lifestyle.

pdf bib
Detecting Domain Polarity-Changes of Words in a Sentiment Lexicon
Shuai Wang | Guangyi Lv | Sahisnu Mazumder | Bing Liu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions
Zhiyu Chen | Honglei Liu | Hu Xu | Seungwhan Moon | Hao Zhou | Bing Liu
Findings of the Association for Computational Linguistics: EMNLP 2021

Existing conversational systems are mostly agent-centric, which assumes the user utterances will closely follow the system ontology. However, in real-world scenarios, it is highly desirable that users can speak freely and naturally. In this work, we attempt to build a user-centric dialogue system for conversational recommendation. As there is no clean mapping for a user’s free form utterance to an ontology, we first model the user preferences as estimated distributions over the system ontology and map the user’s utterances to such distributions. Learning such a mapping poses new challenges on reasoning over various types of knowledge, ranging from factoid knowledge, commonsense knowledge to the users’ own situations. To this end, we build a new dataset named NUANCED that focuses on such realistic settings, with 5.1k dialogues, 26k turns of high-quality user responses. We conduct experiments, showing both the usefulness and challenges of our problem setting. We believe NUANCED can serve as a valuable resource to push existing research from the agent-centric system to the user-centric system. The code and data are publicly available.

pdf bib
Semantic Novelty Detection in Natural Language Descriptions
Nianzu Ma | Alexander Politowicz | Sahisnu Mazumder | Jiahua Chen | Bing Liu | Eric Robertson | Scott Grigsby
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This paper proposes to study a fine-grained semantic novelty detection task, which can be illustrated with the following example. It is normal that a person walks a dog in the park, but if someone says “A man is walking a chicken in the park”, it is novel. Given a set of natural language descriptions of normal scenes, we want to identify descriptions of novel scenes. We are not aware of any existing work that solves the problem. Although existing novelty or anomaly detection algorithms are applicable, since they are usually topic-based, they perform poorly on our fine-grained semantic novelty detection task. This paper proposes an effective model (called GAT-MA) to solve the problem and also contributes a new dataset. Experimental evaluation shows that GAT-MA outperforms 11 baselines by large margins.

pdf bib
ActiveEA: Active Learning for Neural Entity Alignment
Bing Liu | Harrisen Scells | Guido Zuccon | Wen Hua | Genghong Zhao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Entity Alignment (EA) aims to match equivalent entities across different Knowledge Graphs (KGs) and is an essential step of KG fusion. Current mainstream methods – neural EA models – rely on training with seed alignment, i.e., a set of pre-aligned entity pairs which are very costly to annotate. In this paper, we devise a novel Active Learning (AL) framework for neural EA, aiming to create highly informative seed alignment to obtain more effective EA models with less annotation cost. Our framework tackles two main challenges encountered when applying AL to EA: (1) How to exploit dependencies between entities within the AL strategy. Most AL strategies assume that the data instances to sample are independent and identically distributed. However, entities in KGs are related. To address this challenge, we propose a structure-aware uncertainty sampling strategy that can measure the uncertainty of each entity as well as its impact on its neighbour entities in the KG. (2) How to recognise entities that appear in one KG but not in the other KG (i.e., bachelors). Identifying bachelors would likely save annotation budget. To address this challenge, we devise a bachelor recognizer paying attention to alleviate the effect of sampling bias. Empirical results show that our proposed AL strategy can significantly improve sampling quality with good generality across different datasets, EA models and amount of bachelors.

pdf bib
CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks
Zixuan Ke | Bing Liu | Hu Xu | Lei Shu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This paper studies continual learning (CL) of a sequence of aspect sentiment classification (ASC) tasks in a particular CL setting called domain incremental learning (DIL). Each task is from a different domain or product. The DIL setting is particularly suited to ASC because in testing the system needs not know the task/domain to which the test data belongs. To our knowledge, this setting has not been studied before for ASC. This paper proposes a novel model called CLASSIC. The key novelty is a contrastive continual learning method that enables both knowledge transfer across tasks and knowledge distillation from old tasks to the new task, which eliminates the need for task ids in testing. Experimental results show the high effectiveness of CLASSIC.

pdf bib
Continual Learning in Task-Oriented Dialogue Systems
Andrea Madotto | Zhaojiang Lin | Zhenpeng Zhou | Seungwhan Moon | Paul Crook | Bing Liu | Zhou Yu | Eunjoon Cho | Pascale Fung | Zhiguang Wang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Continual learning in task-oriented dialogue systems allows the system to add new domains and functionalities overtime after deployment, without incurring the high cost of retraining the whole system each time. In this paper, we propose a first-ever continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in both modularized and end-to-end learning settings. In addition, we implement and compare multiple existing continual learning baselines, and we propose a simple yet effective architectural method based on residual adapters. We also suggest that the upper bound performance of continual learning should be equivalent to multitask learning when data from all domain is available at once. Our experiments demonstrate that the proposed architectural method and a simple replay-based strategy perform better, by a large margin, compared to other continuous learning techniques, and only slightly worse than the multitask learning upper bound while being 20X faster in learning new domains. We also report several trade-offs in terms of parameter usage, memory size and training time, which are important in the design of a task-oriented dialogue system. The proposed benchmark is released to promote more research in this direction.

pdf bib
Zero-Shot Dialogue State Tracking via Cross-Task Transfer
Zhaojiang Lin | Bing Liu | Andrea Madotto | Seungwhan Moon | Zhenpeng Zhou | Paul Crook | Zhiguang Wang | Zhou Yu | Eunjoon Cho | Rajen Subba | Pascale Fung
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data. In this work, we propose to transfer the cross-task knowledge from general question answering (QA) corpora for the zero-shot DST task. Specifically, we propose TransferQA, a transferable generative QA model that seamlessly combines extractive QA and multi-choice QA via a text-to-text transformer framework, and tracks both categorical slots and non-categorical slots in DST. In addition, we introduce two effective ways to construct unanswerable questions, namely, negative question sampling and context truncation, which enable our model to handle none value slots in the zero-shot DST setting. The extensive experiments show that our approaches substantially improve the existing zero-shot and few-shot results on MultiWoz. Moreover, compared to the fully trained baseline on the Schema-Guided Dialogue dataset, our approach shows better generalization ability in unseen domains.

pdf bib
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI
Alexandros Papangelis | Paweł Budzianowski | Bing Liu | Elnaz Nouri | Abhinav Rastogi | Yun-Nung Chen
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

2020

pdf bib
Understanding Pre-trained BERT for Aspect-based Sentiment Analysis
Hu Xu | Lei Shu | Philip Yu | Bing Liu
Proceedings of the 28th International Conference on Computational Linguistics

This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA). Our work is motivated by the recent progress in BERT-based language models for ABSA. However, it is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important features for downstream tasks in ABSA. By leveraging the annotated datasets in ABSA, we investigate both the attentions and the learned representations of BERT pre-trained on reviews. We found that BERT uses very few self-attention heads to encode context words (such as prepositions or pronouns that indicating an aspect) and opinion words for an aspect. Most features in the representation of an aspect are dedicated to the fine-grained semantics of the domain (or product category) and the aspect itself, instead of carrying summarized opinions from its context. We hope this investigation can help future research in improving self-supervised learning, unsupervised learning and fine-tuning for ABSA. The pre-trained model and code can be found at https://github.com/howardhsu/BERT-for-RRC-ABSA.

pdf bib
Bayes-enhanced Lifelong Attention Networks for Sentiment Classification
Hao Wang | Shuai Wang | Sahisnu Mazumder | Bing Liu | Yan Yang | Tianrui Li
Proceedings of the 28th International Conference on Computational Linguistics

The classic deep learning paradigm learns a model from the training data of a single task and the learned model is also tested on the same task. This paper studies the problem of learning a sequence of tasks (sentiment classification tasks in our case). After each sentiment classification task is learned, its knowledge is retained to help future task learning. Following this setting, we explore attention neural networks and propose a Bayes-enhanced Lifelong Attention Network (BLAN). The key idea is to exploit the generative parameters of naive Bayes to learn attention knowledge. The learned knowledge from each task is stored in a knowledge base and later used to build lifelong attentions. The constructed lifelong attentions are then used to enhance the attention of the network to help new task learning. Experimental results on product reviews from Amazon.com show the effectiveness of the proposed model.

pdf bib
Transformation of Dense and Sparse Text Representations
Wenpeng Hu | Mengyu Wang | Bing Liu | Feng Ji | Jinwen Ma | Dongyan Zhao
Proceedings of the 28th International Conference on Computational Linguistics

Sparsity is regarded as a desirable property of representations, especially in terms of explanation. However, its usage has been limited due to the gap with dense representations. Most research progresses in NLP in recent years are based on dense representations. Thus the desirable property of sparsity cannot be leveraged. Inspired by Fourier Transformation, in this paper, we propose a novel Semantic Transformation method to bridge the dense and sparse spaces, which can facilitate the NLP research to shift from dense spaces to sparse spaces or to jointly use both spaces. Experiments using classification tasks and natural language inference task show that the proposed Semantic Transformation is effective.

pdf bib
Translation vs. Dialogue: A Comparative Analysis of Sequence-to-Sequence Modeling
Wenpeng Hu | Ran Le | Bing Liu | Jinwen Ma | Dongyan Zhao | Rui Yan
Proceedings of the 28th International Conference on Computational Linguistics

Understanding neural models is a major topic of interest in the deep learning community. In this paper, we propose to interpret a general neural model comparatively. Specifically, we study the sequence-to-sequence (Seq2Seq) model in the contexts of two mainstream NLP tasks–machine translation and dialogue response generation–as they both use the seq2seq model. We investigate how the two tasks are different and how their task difference results in major differences in the behaviors of the resulting translation and dialogue generation systems. This study allows us to make several interesting observations and gain valuable insights, which can be used to help develop better translation and dialogue generation models. To our knowledge, no such comparative study has been done so far.

pdf bib
User Memory Reasoning for Conversational Recommendation
Hu Xu | Seungwhan Moon | Honglei Liu | Bing Liu | Pararth Shah | Bing Liu | Philip Yu
Proceedings of the 28th International Conference on Computational Linguistics

We study an end-to-end approach for conversational recommendation that dynamically manages and reasons over users’ past (offline) preferences and current (online) requests through a structured and cumulative user memory knowledge graph. This formulation extends existing state tracking beyond the boundary of a single dialog to user state tracking (UST). For this study, we create a new Memory Graph (MG) <-> Conversational Recommendation parallel corpus called MGConvRex with 7K+ human-to-human role-playing dialogs, grounded on a large-scale user memory bootstrapped from real-world user scenarios. MGConvRex captures human-level reasoning over user memory and has disjoint training/testing sets of users for zero-shot (cold-start) reasoning for recommendation. We propose a simple yet expandable formulation for constructing and updating the MG, and an end-to-end graph-based reasoning model that updates MG from unstructured utterances and predicts optimal dialog policies (eg recommendation) based on updated MG. The prediction of our proposed model inherits the graph structure, providing a natural way to explain policies. Experiments are conducted for both offline metrics and online simulation, showing competitive results.

pdf bib
User Memory Reasoning for Conversational Recommendation
Hu Xu | Seungwhan Moon | Honglei Liu | Bing Liu | Pararth Shah | Bing Liu | Philip Yu
Proceedings of the 28th International Conference on Computational Linguistics

We study an end-to-end approach for conversational recommendation that dynamically manages and reasons over users’ past (offline) preferences and current (online) requests through a structured and cumulative user memory knowledge graph. This formulation extends existing state tracking beyond the boundary of a single dialog to user state tracking (UST). For this study, we create a new Memory Graph (MG) <-> Conversational Recommendation parallel corpus called MGConvRex with 7K+ human-to-human role-playing dialogs, grounded on a large-scale user memory bootstrapped from real-world user scenarios. MGConvRex captures human-level reasoning over user memory and has disjoint training/testing sets of users for zero-shot (cold-start) reasoning for recommendation. We propose a simple yet expandable formulation for constructing and updating the MG, and an end-to-end graph-based reasoning model that updates MG from unstructured utterances and predicts optimal dialog policies (eg recommendation) based on updated MG. The prediction of our proposed model inherits the graph structure, providing a natural way to explain policies. Experiments are conducted for both offline metrics and online simulation, showing competitive results.

pdf bib
Entity-Aware Dependency-Based Deep Graph Attention Network for Comparative Preference Classification
Nianzu Ma | Sahisnu Mazumder | Hao Wang | Bing Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper studies the task of comparative preference classification (CPC). Given two entities in a sentence, our goal is to classify whether the first (or the second) entity is preferred over the other or no comparison is expressed at all between the two entities. Existing works either do not learn entity-aware representations well and fail to deal with sentences involving multiple entity pairs or use sequential modeling approaches that are unable to capture long-range dependencies between the entities. Some also use traditional machine learning approaches that do not generalize well. This paper proposes a novel Entity-aware Dependency-based Deep Graph Attention Network (ED-GAT) that employs a multi-hop graph attention over a dependency graph sentence representation to leverage both the semantic information from word embeddings and the syntactic information from the dependency graph to solve the problem. Empirical evaluation shows that the proposed model achieves the state-of-the-art performance in comparative preference classification.

pdf bib
Feature Projection for Improved Text Classification
Qi Qin | Wenpeng Hu | Bing Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In classification, there are usually some good features that are indicative of class labels. For example, in sentiment classification, words like good and nice are indicative of the positive sentiment and words like bad and terrible are indicative of the negative sentiment. However, there are also many common features (e.g., words) that are not indicative of any specific class (e.g., voice and screen, which are common to both sentiment classes and are not discriminative for classification). Although deep learning has made significant progresses in generating discriminative features through its powerful representation learning, we believe there is still room for improvement. In this paper, we propose a novel angle to further improve this representation learning, i.e., feature projection. This method projects existing features into the orthogonal space of the common features. The resulting projection is thus perpendicular to the common features and more discriminative for classification. We apply this new method to improve CNN, RNN, Transformer, and Bert based text classification and obtain markedly better results.

pdf bib
Human-Human Health Coaching via Text Messages: Corpus, Annotation, and Analysis
Itika Gupta | Barbara Di Eugenio | Brian Ziebart | Aiswarya Baiju | Bing Liu | Ben Gerber | Lisa Sharp | Nadia Nabulsi | Mary Smart
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Our goal is to develop and deploy a virtual assistant health coach that can help patients set realistic physical activity goals and live a more active lifestyle. Since there is no publicly shared dataset of health coaching dialogues, the first phase of our research focused on data collection. We hired a certified health coach and 28 patients to collect the first round of human-human health coaching interaction which took place via text messages. This resulted in 2853 messages. The data collection phase was followed by conversation analysis to gain insight into the way information exchange takes place between a health coach and a patient. This was formalized using two annotation schemas: one that focuses on the goals the patient is setting and another that models the higher-level structure of the interactions. In this paper, we discuss these schemas and briefly talk about their application for automatically extracting activity goals and annotating the second round of data, collected with different health coaches and patients. Given the resource-intensive nature of data annotation, successfully annotating a new dataset automatically is key to answer the need for high quality, large datasets.

pdf bib
Using the Past Knowledge to Improve Sentiment Classification
Qi Qin | Wenpeng Hu | Bing Liu
Findings of the Association for Computational Linguistics: EMNLP 2020

This paper studies sentiment classification in the lifelong learning setting that incrementally learns a sequence of sentiment classification tasks. It proposes a new lifelong learning model (called L2PG) that can retain and selectively transfer the knowledge learned in the past to help learn the new task. A key innovation of this proposed model is a novel parameter-gate (p-gate) mechanism that regulates the flow or transfer of the previously learned knowledge to the new task. Specifically, it can selectively use the network parameters (which represent the retained knowledge gained from the previous tasks) to assist the learning of the new task t. Knowledge distillation is also employed in the process to preserve the past knowledge by approximating the network output at the state when task t-1 was learned. Experimental results show that L2PG outperforms strong baselines, including even multiple task learning.

pdf bib
A Knowledge-Driven Approach to Classifying Object and Attribute Coreferences in Opinion Mining
Jiahua Chen | Shuai Wang | Sahisnu Mazumder | Bing Liu
Findings of the Association for Computational Linguistics: EMNLP 2020

Classifying and resolving coreferences of objects (e.g., product names) and attributes (e.g., product aspects) in opinionated reviews is crucial for improving the opinion mining performance. However, the task is challenging as one often needs to consider domain-specific knowledge (e.g., iPad is a tablet and has aspect resolution) to identify coreferences in opinionated reviews. Also, compiling a handcrafted and curated domain-specific knowledge base for each domain is very time consuming and arduous. This paper proposes an approach to automatically mine and leverage domain-specific knowledge for classifying objects and attribute coreferences. The approach extracts domain-specific knowledge from unlabeled review data and trains a knowledgeaware neural coreference classification model to leverage (useful) domain knowledge together with general commonsense knowledge for the task. Experimental evaluation on realworld datasets involving five domains (product types) shows the effectiveness of the approach

pdf bib
DomBERT: Domain-oriented Language Model for Aspect-based Sentiment Analysis
Hu Xu | Bing Liu | Lei Shu | Philip Yu
Findings of the Association for Computational Linguistics: EMNLP 2020

This paper focuses on learning domain-oriented language models driven by end tasks, which aims to combine the worlds of both general-purpose language models (such as ELMo and BERT) and domain-specific language understanding. We propose DomBERT, an extension of BERT to learn from both in-domain corpus and relevant domain corpora. This helps in learning domain language models with low-resources. Experiments are conducted on an assortment of tasks in aspect-based sentiment analysis (ABSA), demonstrating promising results.

pdf bib
Controllable Text Generation with Focused Variation
Lei Shu | Alexandros Papangelis | Yi-Chia Wang | Gokhan Tur | Hu Xu | Zhaleh Feizollahi | Bing Liu | Piero Molino
Findings of the Association for Computational Linguistics: EMNLP 2020

This work introduces Focused-Variation Network (FVN), a novel model to control language generation. The main problems in previous controlled language generation models range from the difficulty of generating text according to the given attributes, to the lack of diversity of the generated texts. FVN addresses these issues by learning disjoint discrete latent spaces for each attribute inside codebooks, which allows for both controllability and diversity, while at the same time generating fluent text. We evaluate FVN on two text generation datasets with annotated content and style, and show state-of-the-art performance as assessed by automatic and human evaluations.

2019

pdf bib
DOER: Dual Cross-Shared RNN for Aspect Term-Polarity Co-Extraction
Huaishao Luo | Tianrui Li | Bing Liu | Junbo Zhang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper focuses on two related subtasks of aspect-based sentiment analysis, namely aspect term extraction and aspect sentiment classification, which we call aspect term-polarity co-extraction. The former task is to extract aspects of a product or service from an opinion document, and the latter is to identify the polarity expressed in the document about these extracted aspects. Most existing algorithms address them as two separate tasks and solve them one by one, or only perform one task, which can be complicated for real applications. In this paper, we treat these two tasks as two sequence labeling problems and propose a novel Dual crOss-sharEd RNN framework (DOER) to generate all aspect term-polarity pairs of the input sentence simultaneously. Specifically, DOER involves a dual recurrent neural network to extract the respective representation of each task, and a cross-shared unit to consider the relationship between them. Experimental results demonstrate that the proposed framework outperforms state-of-the-art baselines on three benchmark datasets.

pdf bib
BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis
Hu Xu | Bing Liu | Lei Shu | Philip Yu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Question-answering plays an important role in e-commerce as it allows potential customers to actively seek crucial information about products or services to help their purchase decision making. Inspired by the recent success of machine reading comprehension (MRC) on formal documents, this paper explores the potential of turning customer reviews into a large source of knowledge that can be exploited to answer user questions. We call this problem Review Reading Comprehension (RRC). To the best of our knowledge, no existing work has been done on RRC. In this work, we first build an RRC dataset called ReviewRC based on a popular benchmark for aspect-based sentiment analysis. Since ReviewRC has limited training examples for RRC (and also for aspect-based sentiment analysis), we then explore a novel post-training approach on the popular language model BERT to enhance the performance of fine-tuning of BERT for RRC. To show the generality of the approach, the proposed post-training is also applied to some other review-based tasks such as aspect extraction and aspect sentiment classification in aspect-based sentiment analysis. Experimental results demonstrate that the proposed post-training is highly effective.

pdf bib
Modeling Multi-Action Policy for Task-Oriented Dialogues
Lei Shu | Hu Xu | Bing Liu | Piero Molino
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Dialogue management (DM) plays a key role in the quality of the interaction with the user in a task-oriented dialogue system. In most existing approaches, the agent predicts only one DM policy action per turn. This significantly limits the expressive power of the conversational agent and introduces unwanted turns of interactions that may challenge users’ patience. Longer conversations also lead to more errors and the system needs to be more robust to handle them. In this paper, we compare the performance of several models on the task of predicting multiple acts for each turn. A novel policy model is proposed based on a recurrent cell called gated Continue-Act-Slots (gCAS) that overcomes the limitations of the existing models. Experimental results show that gCAS outperforms other approaches. The datasets and code are available at https://leishu02.github.io/.

pdf bib
Learning with Noisy Labels for Sentence-level Sentiment Classification
Hao Wang | Bing Liu | Chaozhuo Li | Yan Yang | Tianrui Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Deep neural networks (DNNs) can fit (or even over-fit) the training data very well. If a DNN model is trained using data with noisy labels and tested on data with clean labels, the model may perform poorly. This paper studies the problem of learning with noisy labels for sentence-level sentiment classification. We propose a novel DNN model called NetAb (as shorthand for convolutional neural Networks with Ab-networks) to handle noisy labels during training. NetAb consists of two convolutional neural networks, one with a noise transition layer for dealing with the input noisy labels and the other for predicting ‘clean’ labels. We train the two networks using their respective loss functions in a mutual reinforcement manner. Experimental results demonstrate the effectiveness of the proposed model.

pdf bib
Lifelong and Interactive Learning of Factual Knowledge in Dialogues
Sahisnu Mazumder | Bing Liu | Shuai Wang | Nianzu Ma
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Dialogue systems are increasingly using knowledge bases (KBs) storing real-world facts to help generate quality responses. However, as the KBs are inherently incomplete and remain fixed during conversation, it limits dialogue systems’ ability to answer questions and to handle questions involving entities or relations that are not in the KB. In this paper, we make an attempt to propose an engine for Continuous and Interactive Learning of Knowledge (CILK) for dialogue systems to give them the ability to continuously and interactively learn and infer new knowledge during conversations. With more knowledge accumulated over time, they will be able to learn better and answer more questions. Our empirical evaluation shows that CILK is promising.

pdf bib
Flexibly-Structured Model for Task-Oriented Dialogues
Lei Shu | Piero Molino | Mahdi Namazifar | Hu Xu | Bing Liu | Huaixiu Zheng | Gokhan Tur
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

This paper proposes a novel end-to-end architecture for task-oriented dialogue systems. It is based on a simple and practical yet very effective sequence-to-sequence approach, where language understanding and state tracking tasks are modeled jointly with a structured copy-augmented sequential decoder and a multi-label decoder for each slot. The policy engine and language generation tasks are modeled jointly following that. The copy-augmented sequential decoder deals with new or unknown values in the conversation, while the multi-label decoder combined with the sequential decoder ensures the explicit assignment of values to slots. On the generation part, slot binary classifiers are used to improve performance. This architecture is scalable to real-world scenarios and is shown through an empirical evaluation to achieve state-of-the-art performance on both the Cambridge Restaurant dataset and the Stanford in-car assistant dataset.

2018

pdf bib
Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Bing Liu | Gokhan Tür | Dilek Hakkani-Tür | Pararth Shah | Larry Heck
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent’s capability in successfully completing a task.

pdf bib
Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning
Pararth Shah | Dilek Hakkani-Tür | Bing Liu | Gokhan Tür
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

End-to-end neural models show great promise towards building conversational agents that are trained from data and on-line experience using supervised and reinforcement learning. However, these models require a large corpus of dialogues to learn effectively. For goal-oriented dialogues, such datasets are expensive to collect and annotate, since each task involves a separate schema and database of entities. Further, the Wizard-of-Oz approach commonly used for dialogue collection does not provide sufficient coverage of salient dialogue flows, which is critical for guaranteeing an acceptable task completion rate in consumer-facing conversational agents. In this paper, we study a recently proposed approach for building an agent for arbitrary tasks by combining dialogue self-play and crowd-sourcing to generate fully-annotated dialogues with diverse and natural utterances. We discuss the advantages of this approach for industry applications of conversational agents, wherein an agent can be rapidly bootstrapped to deploy in front of users and further optimized via interactive learning from actual users of the system.

pdf bib
End-to-End Learning of Task-Oriented Dialogs
Bing Liu | Ian Lane
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

In this thesis proposal, we address the limitations of conventional pipeline design of task-oriented dialog systems and propose end-to-end learning solutions. We design neural network based dialog system that is able to robustly track dialog state, interface with knowledge bases, and incorporate structured query results into system responses to successfully complete task-oriented dialog. In learning such neural network based dialog systems, we propose hybrid offline training and online interactive learning methods. We introduce a multi-task learning method in pre-training the dialog agent in a supervised manner using task-oriented dialog corpora. The supervised training agent can further be improved via interacting with users and learning online from user demonstration and feedback with imitation and reinforcement learning. In addressing the sample efficiency issue with online policy learning, we further propose a method by combining the learning-from-user and learning-from-simulation approaches to improve the online interactive learning efficiency.

pdf bib
Target-Sensitive Memory Networks for Aspect Sentiment Classification
Shuai Wang | Sahisnu Mazumder | Bing Liu | Mianwei Zhou | Yi Chang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Aspect sentiment classification (ASC) is a fundamental task in sentiment analysis. Given an aspect/target and a sentence, the task classifies the sentiment polarity expressed on the target in the sentence. Memory networks (MNs) have been used for this task recently and have achieved state-of-the-art results. In MNs, attention mechanism plays a crucial role in detecting the sentiment context for the given target. However, we found an important problem with the current MNs in performing the ASC task. Simply improving the attention mechanism will not solve it. The problem is referred to as target-sensitive sentiment, which means that the sentiment polarity of the (detected) context is dependent on the given target and it cannot be inferred from the context alone. To tackle this problem, we propose the target-sensitive memory networks (TMNs). Several alternative techniques are designed for the implementation of TMNs and their effectiveness is experimentally evaluated.

pdf bib
Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction
Hu Xu | Bing Liu | Lei Shu | Philip S. Yu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

One key task of fine-grained sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using deep learning. Unlike other highly sophisticated supervised deep learning models, this paper proposes a novel and yet simple CNN model employing two types of pre-trained embeddings for aspect extraction: general-purpose embeddings and domain-specific embeddings. Without using any additional supervision, this model achieves surprisingly good results, outperforming state-of-the-art sophisticated existing methods. To our knowledge, this paper is the first to report such double embeddings based CNN model for aspect extraction and achieve very good results.

pdf bib
An Attribute Enhanced Domain Adaptive Model for Cold-Start Spam Review Detection
Zhenni You | Tieyun Qian | Bing Liu
Proceedings of the 27th International Conference on Computational Linguistics

Spam detection has long been a research topic in both academic and industry due to its wide applications. Previous studies are mainly focused on extracting linguistic or behavior features to distinguish the spam and legitimate reviews. Such features are either ineffective or take long time to collect and thus are hard to be applied to cold-start spam review detection tasks. Recent advance leveraged the neural network to encode the textual and behavior features for the cold-start problem. However, the abundant attribute information are largely neglected by the existing framework. In this paper, we propose a novel deep learning architecture for incorporating entities and their inherent attributes from various domains into a unified framework. Specifically, our model not only encodes the entities of reviewer, item, and review, but also their attributes such as location, date, price ranges. Furthermore, we present a domain classifier to adapt the knowledge from one domain to the other. With the abundant attributes in existing entities and knowledge in other domains, we successfully solve the problem of data scarcity in the cold-start settings. Experimental results on two Yelp datasets prove that our proposed framework significantly outperforms the state-of-the-art methods.

pdf bib
Adversarial Learning of Task-Oriented Neural Dialog Models
Bing Liu | Ian Lane
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

In this work, we propose an adversarial learning method for reward estimation in reinforcement learning (RL) based task-oriented dialog models. Most of the current RL based task-oriented dialog systems require the access to a reward signal from either user feedback or user ratings. Such user ratings, however, may not always be consistent or available in practice. Furthermore, online dialog policy learning with RL typically requires a large number of queries to users, suffering from sample efficiency problem. To address these challenges, we propose an adversarial learning method to learn dialog rewards directly from dialog samples. Such rewards are further used to optimize the dialog policy with policy gradient based RL. In the evaluation in a restaurant search domain, we show that the proposed adversarial dialog learning method achieves advanced dialog success rate comparing to strong baseline methods. We further discuss the covariate shift problem in online adversarial dialog learning and show how we can address that with partial access to user feedback.

2017

pdf bib
Sentiment Lexicon Expansion Based on Neural PU Learning, Double Dictionary Lookup, and Polarity Association
Yasheng Wang | Yang Zhang | Bing Liu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Although many sentiment lexicons in different languages exist, most are not comprehensive. In a recent sentiment analysis application, we used a large Chinese sentiment lexicon and found that it missed a large number of sentiment words in social media. This prompted us to make a new attempt to study sentiment lexicon expansion. This paper first poses the problem as a PU learning problem, which is a new formulation. It then proposes a new PU learning method suitable for our problem using a neural network. The results are enhanced further with a new dictionary-based technique and a novel polarity classification technique. Experimental results show that the proposed approach outperforms baseline methods greatly.

pdf bib
DOC: Deep Open Classification of Text Documents
Lei Shu | Hu Xu | Bing Liu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Traditional supervised learning makes the closed-world assumption that the classes appeared in the test data must have appeared in training. This also applies to text learning or text classification. As learning is used increasingly in dynamic open environments where some new/test documents may not belong to any of the training classes, identifying these novel documents during classification presents an important problem. This problem is called open-world classification or open classification. This paper proposes a novel deep learning based approach. It outperforms existing state-of-the-art techniques dramatically.

pdf bib
Lifelong Learning CRF for Supervised Aspect Extraction
Lei Shu | Hu Xu | Bing Liu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper makes a focused contribution to supervised aspect extraction. It shows that if the system has performed aspect extraction from many past domains and retained their results as knowledge, Conditional Random Fields (CRF) can leverage this knowledge in a lifelong learning manner to extract in a new domain markedly better than the traditional CRF without using this prior knowledge. The key innovation is that even after CRF training, the model can still improve its extraction with experiences in its applications.

2016

pdf bib
Lifelong-RL: Lifelong Relaxation Labeling for Separating Entities and Aspects in Opinion Targets
Lei Shu | Bing Liu | Hu Xu | Annice Kim
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

bib
Lifelong Machine Learning for Natural Language Processing
Zhiyuan Chen | Bing Liu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Machine learning (ML) has been successfully used as a prevalent approach to solving numerous NLP problems. However, the classic ML paradigm learns in isolation. That is, given a dataset, an ML algorithm is executed on the dataset to produce a model without using any related or prior knowledge. Although this type of isolated learning is very useful, it also has serious limitations as it does not accumulate knowledge learned in the past and use the knowledge to help future learning, which is the hallmark of human learning and human intelligence. Lifelong machine learning (LML) aims to achieve this capability. Specifically, it aims to design and develop computational learning systems and algorithms that learn as humans do, i.e., retaining the results learned in the past, abstracting knowledge from them, and using the knowledge to help future learning. In this tutorial, we will introduce the existing research of LML and to show that LML is very suitable for NLP tasks and has potential to help NLP make major progresses.

pdf bib
Breaking the Closed World Assumption in Text Classification
Geli Fei | Bing Liu
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Joint Online Spoken Language Understanding and Language Modeling With Recurrent Neural Networks
Bing Liu | Ian Lane
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2015

pdf bib
Social Media Text Classification under Negative Covariate Shift
Geli Fei | Bing Liu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Lifelong Learning for Sentiment Classification
Zhiyuan Chen | Nianzu Ma | Bing Liu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Exploiting Social Relations and Sentiment for Stock Prediction
Jianfeng Si | Arjun Mukherjee | Bing Liu | Sinno Jialin Pan | Qing Li | Huayi Li
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Aspect Extraction with Automated Prior Knowledge Learning
Zhiyuan Chen | Arjun Mukherjee | Bing Liu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Tri-Training for Authorship Attribution with Limited Training Data
Tieyun Qian | Bing Liu | Li Chen | Zhiyong Peng
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Review Topic Discovery with Phrases using the Pólya Urn Model
Geli Fei | Zhiyuan Chen | Bing Liu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Identifying Multiple Userids of the Same Author
Tieyun Qian | Bing Liu
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Exploiting Domain Knowledge in Aspect Extraction
Zhiyuan Chen | Arjun Mukherjee | Bing Liu | Meichun Hsu | Malu Castellanos | Riddhiman Ghosh
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Discovering User Interactions in Ideological Discussions
Arjun Mukherjee | Bing Liu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Public Dialogue: Analysis of Tolerance in Online Discussions
Arjun Mukherjee | Vivek Venkataraman | Bing Liu | Sharon Meraz
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Exploiting Topic based Twitter Sentiment for Stock Prediction
Jianfeng Si | Arjun Mukherjee | Bing Liu | Qing Li | Huayi Li | Xiaotie Deng
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Identifying Intention Posts in Discussion Forums
Zhiyuan Chen | Bing Liu | Meichun Hsu | Malu Castellanos | Riddhiman Ghosh
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Modeling Review Comments
Arjun Mukherjee | Bing Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Aspect Extraction through Semi-Supervised Modeling
Arjun Mukherjee | Bing Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Analysis of Linguistic Style Accommodation in Online Debates
Arjun Mukherjee | Bing Liu
Proceedings of COLING 2012

pdf bib
A Dictionary-Based Approach to Identifying Aspects Implied by Adjectives for Opinion Mining
Geli Fei | Bing Liu | Meichun Hsu | Malu Castellanos | Riddhiman Ghosh
Proceedings of COLING 2012: Posters

2011

pdf bib
Identifying Noun Product Features that Imply Opinions
Lei Zhang | Bing Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Extracting Resource Terms for Sentiment Analysis
Lei Zhang | Bing Liu
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Opinion Word Expansion and Target Extraction through Double Propagation
Guang Qiu | Bing Liu | Jiajun Bu | Chun Chen
Computational Linguistics, Volume 37, Issue 1 - March 2011

2010

pdf bib
Distributional Similarity vs. PU Learning for Entity Set Expansion
Xiao-Li Li | Lei Zhang | Bing Liu | See-Kiong Ng
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Improving Gender Classification of Blog Authors
Arjun Mukherjee | Bing Liu
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Negative Training Data Can be Harmful to Text Classification
Xiao-Li Li | Bing Liu | See-Kiong Ng
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Soochow University: Description and Analysis of the Chinese Word Sense Induction System for CLP2010
Hua Xu | Bing Liu | Longhua Qian | Guodong Zhou
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Resolving Object and Attribute Coreference in Opinion Mining
Xiaowen Ding | Bing Liu
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Grouping Product Features Using Semi-Supervised Learning with Soft-Constraints
Zhongwu Zhai | Bing Liu | Hua Xu | Peifa Jia
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Dependency-Driven Feature-based Learning for Extracting Protein-Protein Interactions from Biomedical Text
Bing Liu | Longhua Qian | Hongling Wang | Guodong Zhou
Coling 2010: Posters

pdf bib
Extracting and Ranking Product Features in Opinion Documents
Lei Zhang | Bing Liu | Suk Hwan Lim | Eamonn O’Brien-Strain
Coling 2010: Posters

2009

pdf bib
Sentiment Analysis of Conditional Sentences
Ramanathan Narayanan | Bing Liu | Alok Choudhary
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Mining Opinions in Comparative Sentences
Murthy Ganapathibhotla | Bing Liu
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

Search
Co-authors