Yang Liu

刘扬; Ph.D Purdue; ICSI, Dallas, Facebook, Liulishuo, Amazon

Also published as: Y. Liu

Other people with similar names: Yang Janet Liu (Georgetown University; 刘洋), Yang Liu (May refer to several people), Yang Liu (3M Health Information Systems), Yang Liu (University of Helsinki), Yang Liu (Beijing Language and Culture University), Yang Liu (National University of Defense Technology), Yang Liu (Edinburgh Ph.D., Microsoft), Yang Liu (The Chinese University of Hong Kong (Shenzhen)), Yang Liu (刘洋; ICT, Tsinghua, Beijing Academy of Artificial Intelligence), Yang Liu (Microsoft Cognitive Services Research), Yang Liu (Peking University), Yang Liu (Samsung Research Center Beijing), Yang Liu (Tianjin University, China), Yang Liu (Univ. of Michigan, UC Santa Cruz), Yang Liu (Wilfrid Laurier University)


2023

pdf bib
PLACES: Prompting Language Models for Social Conversation Synthesis
Maximillian Chen | Alexandros Papangelis | Chenyang Tao | Seokhwan Kim | Andy Rosenbaum | Yang Liu | Zhou Yu | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: EACL 2023

Collecting high quality conversational data can be very expensive for most applications and infeasible for others due to privacy, ethical, or similar concerns. A promising direction to tackle this problem is to generate synthetic dialogues by prompting large language models. In this work, we use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations. This includes various dimensions of conversation quality with human evaluation directly on the synthesized conversations, and interactive human evaluation of chatbots fine-tuned on the synthetically generated dataset. We additionally demonstrate that this prompting approach is generalizable to multi-party conversations, providing potential to create new synthetic data for multi-party tasks. Our synthetic multi-party conversations were rated more favorably across all measured dimensions compared to conversation excerpts sampled from a human-collected multi-party dataset.

pdf bib
Clinical note section classification on doctor-patient conversations in low-resourced settings
Zhuohao Chen | Jangwon Kim | Yang Liu | Shrikanth Narayanan
Proceedings of the Third Workshop on NLP for Medical Conversations

pdf bib
“What do others think?”: Task-Oriented Conversational Modeling with Subjective Knowledge
Chao Zhao | Spandana Gella | Seokhwan Kim | Di Jin | Devamanyu Hazarika | Alexandros Papangelis | Behnam Hedayatnia | Mahdi Namazifar | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Task-oriented Dialogue (TOD) Systems aim to build dialogue systems that assist users in accomplishing specific goals, such as booking a hotel or a restaurant. Traditional TODs rely on domain-specific APIs/DBs or external factual knowledge to generate responses, which cannot accommodate subjective user requests (e.g.,”Is the WIFI reliable?” or “Does the restaurant have a good atmosphere?”). To address this issue, we propose a novel task of subjective-knowledge-based TOD (SK-TOD). We also propose the first corresponding dataset, which contains subjective knowledge-seeking dialogue contexts and manually annotated responses grounded in subjective knowledge sources. When evaluated with existing TOD approaches, we find that this task poses new challenges such as aggregating diverse opinions from multiple knowledge snippets. We hope this task and dataset can promote further research on TOD and subjective content understanding. The code and the dataset are available at https://github.com/alexa/dstc11-track5.

pdf bib
Investigating the Representation of Open Domain Dialogue Context for Transformer Models
Vishakh Padmakumar | Behnam Hedayatnia | Di Jin | Patrick Lange | Seokhwan Kim | Nanyun Peng | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

The bulk of work adapting transformer models to open-domain dialogue represents dialogue context as the concatenated set of turns in natural language. However, it is unclear if this is the best approach. In this work, we investigate this question by means of an empirical controlled experiment varying the dialogue context format from text-only formats (all recent utterances, summaries, selected utterances) as well as variants that are more structurally different (triples, AMR). We compare these formats based on fine-tuned model performance on two downstream tasks—knowledge selection and response generation. We find that simply concatenating the utterances works as a strong baseline in most cases, but is outperformed in longer contexts by a hybrid approach of combining a summary of the context with recent utterances. Through empirical analysis, our work highlights the need to examine the format of context representation and offers recommendations on adapting general-purpose language models to dialogue tasks.

pdf bib
MERCY: Multiple Response Ranking Concurrently in Realistic Open-Domain Conversational Systems
Sarik Ghazarian | Behnam Hedayatnia | Di Jin | Sijia Liu | Nanyun Peng | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Automatic Evaluation (AE) and Response Selection (RS) models assign quality scores to various candidate responses and rank them in conversational setups. Prior response ranking research compares various models’ performance on synthetically generated test sets. In this work, we investigate the performance of model-based reference-free AE and RS models on our constructed response ranking datasets that mirror real-case scenarios of ranking candidates during inference time. Metrics’ unsatisfying performance can be interpreted as their low generalizability over more pragmatic conversational domains such as human-chatbot dialogs. To alleviate this issue we propose a novel RS model called MERCY that simulates human behavior in selecting the best candidate by taking into account distinct candidates concurrently and learns to rank them. In addition, MERCY leverages natural language feedback as another component to help the ranking task by explaining why each candidate response is relevant/irrelevant to the dialog context. These feedbacks are generated by prompting large language models in a few-shot setup. Our experiments show the better performance of MERCY over baselines for the response ranking task in our curated realistic datasets.

pdf bib
Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information
Yen-Ting Lin | Alexandros Papangelis | Seokhwan Kim | Sungjin Lee | Devamanyu Hazarika | Mahdi Namazifar | Di Jin | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

This work focuses on in-context data augmentation for intent detection. Having found that augmentation via in-context prompting of large pre-trained language models (PLMs) alone does not improve performance, we introduce a novel approach based on PLMs and pointwise V-information (PVI), a metric that can measure the usefulness of a datapoint for training a model. Our method first fine-tunes a PLM on a small seed of training data and then synthesizes new datapoints - utterances that correspond to given intents. It then employs intent-aware filtering, based on PVI, to remove datapoints that are not helpful to the downstream intent classifier. Our method is thus able to leverage the expressive power of large language models to produce diverse training data. Empirical results demonstrate that our method can produce synthetic training data that achieve state-of-the-art performance on three challenging intent detection datasets under few-shot settings (1.28% absolute improvement in 5-shot and 1.18% absolute in 10-shot, on average) and perform on par with the state-of-the-art in full-shot settings (within 0.01% absolute, on average).

2022

pdf bib
A Systematic Evaluation of Response Selection for Open Domain Dialogue
Behnam Hedayatnia | Di Jin | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Recent progress on neural approaches for language processing has triggered a resurgence of interest on building intelligent open-domain chatbots. However, even the state-of-the-art neural chatbots cannot produce satisfying responses for every turn in a dialog. A practical solution is to generate multiple response candidates for the same context, and then perform response ranking/selection to determine which candidate is the best. Previous work in response selection typically trains response rankers using synthetic data that is formed from existing dialogs by using a ground truth response as the single appropriate response and constructing inappropriate responses via random selection or using adversarial methods. In this work, we curated a dataset where responses from multiple response generators produced for the same dialog context are manually annotated as appropriate (positive) and inappropriate (negative). We argue that such training data better matches the actual use case examples, enabling the models to learn to rank responses effectively. With this new dataset, we conduct a systematic evaluation of state-of-the-art methods for response selection, and demonstrate that both strategies of using multiple positive candidates and using manually verified hard negative candidates can bring in significant performance improvement in comparison to using the adversarial training data, e.g., increase of 3% and 13% in Recall@1 score, respectively.

pdf bib
Improving Bot Response Contradiction Detection via Utterance Rewriting
Di Jin | Sijia Liu | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Though chatbots based on large neural models can often produce fluent responses in open domain conversations, one salient error type is contradiction or inconsistency with the preceding conversation turns. Previous work has treated contradiction detection in bot responses as a task similar to natural language inference, e.g., detect the contradiction between a pair of bot utterances. However, utterances in conversations may contain co-references or ellipsis, and using these utterances as is may not always be sufficient for identifying contradictions. This work aims to improve the contradiction detection via rewriting all bot utterances to restore co-references and ellipsis. We curated a new dataset for utterance rewriting and built a rewriting model on it. We empirically demonstrate that this model can produce satisfactory rewrites to make bot utterances more complete. Furthermore, using rewritten utterances improves contradiction detection performance significantly, e.g., the AUPR and joint accuracy scores (detecting contradiction along with evidence) increase by 6.5% and 4.5% (absolute increase), respectively.

pdf bib
Automatically Detecting Reduced-formed English Pronunciations by Using Deep Learning
Lei Chen | Chenglin Jiang | Yiwei Gu | Yang Liu | Jiahong Yuan
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)

Reduced form pronunciations are widely used by native English speakers, especially in casual conversations. Second language (L2) learners have difficulty in processing reduced form pronunciations in listening comprehension and face challenges in production too. Meanwhile, training applications dedicated to reduced forms are still few. To solve this issue, we report on our first effort of using deep learning to evaluate L2 learners’ reduced form pronunciations. Compared with a baseline solution that uses an ASR to determine regular or reduced-formed pronunciations, a classifier that learns representative features via a convolution neural network (CNN) on low-level acoustic features, yields higher detection performance. F-1 metric has been increased from 0.690 to 0.757 on the reduction task. Furthermore, adding word entities to compute attention weights to better adjust the features learned by the CNN model helps increasing F-1 to 0.763.

pdf bib
Leveraging Seq2seq Language Generation for Multi-level Product Issue Identification
Yang Liu | Varnith Chordia | Hua Li | Siavash Fazeli Dehkordy | Yifei Sun | Vincent Gao | Na Zhang
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)

In a leading e-commerce business, we receive hundreds of millions of customer feedback from different text communication channels such as product reviews. The feedback can contain rich information regarding customers’ dissatisfaction in the quality of goods and services. To harness such information to better serve customers, in this paper, we created a machine learning approach to automatically identify product issues and uncover root causes from the customer feedback text. We identify issues at two levels: coarse grained (L-Coarse) and fine grained (L-Granular). We formulate this multi-level product issue identification problem as a seq2seq language generation problem. Specifically, we utilize transformer-based seq2seq models due to their versatility and strong transfer-learning capability. We demonstrate that our approach is label efficient and outperforms the traditional approach such as multi-class multi-label classification formulation. Based on human evaluation, our fine-tuned model achieves 82.1% and 95.4% human-level performance for L-Coarse and L-Granular issue identification, respectively. Furthermore, our experiments illustrate that the model can generalize to identify unseen L-Granular issues.

pdf bib
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation
Pei Zhou | Karthik Gopalakrishnan | Behnam Hedayatnia | Seokhwan Kim | Jay Pujara | Xiang Ren | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Implicit knowledge, such as common sense, is key to fluid human conversations. Current neural response generation (RG) models are trained to generate responses directly, omitting unstated implicit knowledge. In this paper, we present Think-Before-Speaking (TBS), a generative approach to first externalize implicit commonsense knowledge (think) and use this knowledge to generate responses (speak). We argue that externalizing implicit knowledge allows more efficient learning, produces more informative responses, and enables more explainable models. We analyze different choices to collect knowledge-aligned dialogues, represent implicit knowledge, and transition between knowledge and dialogues. Empirical results show TBS models outperform end-to-end and knowledge-augmented RG baselines on most automatic metrics and generate more informative, specific, and commonsense-following responses, as evaluated by human annotators. TBS also generates knowledge that makes sense and is relevant to the dialogue around 85% of the time

pdf bib
Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning
Yifan Chen | Devamanyu Hazarika | Mahdi Namazifar | Yang Liu | Di Jin | Dilek Hakkani-Tur
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Prefix-tuning, or more generally continuous prompt tuning, has become an essential paradigm of parameter-efficient transfer learning. Using a large pre-trained language model (PLM), prefix-tuning can obtain strong performance by training only a small portion of parameters. In this paper, we propose to understand and further develop prefix-tuning through the kernel lens. Specifically, we make an analogy between prefixes and inducing variables in kernel methods and hypothesize that prefixes serving as inducing variables would improve their overall mechanism. From the kernel estimator perspective, we suggest a new variant of prefix-tuning—inducer-tuning, which shares the exact mechanism as prefix-tuning while leveraging the residual form found in adapter-tuning. This mitigates the initialization issue in prefix-tuning. Through comprehensive empirical experiments on natural language understanding and generation tasks, we demonstrate that inducer-tuning can close the performance gap between prefix-tuning and fine-tuning.

pdf bib
A Template-based Method for Constrained Neural Machine Translation
Shuo Wang | Peng Li | Zhixing Tan | Zhaopeng Tu | Maosong Sun | Yang Liu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Machine translation systems are expected to cope with various types of constraints in many practical scenarios. While neural machine translation (NMT) has achieved strong performance in unconstrained cases, it is non-trivial to impose pre-specified constraints into the translation process of NMT models. Although many approaches have been proposed to address this issue, most existing methods can not satisfy the following three desiderata at the same time: (1) high translation quality, (2) high match accuracy, and (3) low latency. In this work, we propose a template-based method that can yield results with high translation quality and match accuracy and the inference speed of our method is comparable with unconstrained NMT models. Our basic idea is to rearrange the generation of constrained and unconstrained tokens through a template. Our method does not require any changes in the model architecture and the decoding algorithm. Experimental results show that the proposed template-based approach can outperform several representative baselines in both lexically and structurally constrained translation tasks.

pdf bib
ParaTag: A Dataset of Paraphrase Tagging for Fine-Grained Labels, NLG Evaluation, and Data Augmentation
Shuohang Wang | Ruochen Xu | Yang Liu | Chenguang Zhu | Michael Zeng
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Paraphrase identification has been formulated as a binary classification task to decide whether two sentences hold a paraphrase relationship. Existing paraphrase datasets only annotate a binary label for each sentence pair. However, after a systematical analysis of existing paraphrase datasets, we found that the degree of paraphrase cannot be well characterized by a single binary label. And the criteria of paraphrase are not even consistent within the same dataset. We hypothesize that such issues would limit the effectiveness of paraphrase models trained on these data. To this end, we propose a novel fine-grained paraphrase annotation schema that labels the minimum spans of tokens in a sentence that don’t have the corresponding paraphrases in the other sentence. Under this setting, we frame paraphrasing as a sequence tagging task. We collect 30k sentence pairs in English with the new annotation schema, resulting in the ParaTag dataset. In addition to reporting baseline results on ParaTag using state-of-art language models, we show that ParaTag is especially useful for training an automatic scorer for language generation evaluation. Finally, we train a paraphrase generation model from ParaTag and achieve better data augmentation performance on the GLUE benchmark than other public paraphrasing datasets.

pdf bib
Entropy-Based Vocabulary Substitution for Incremental Learning in Multilingual Neural Machine Translation
Kaiyu Huang | Peng Li | Jin Ma | Yang Liu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In a practical real-world scenario, the longstanding goal is that a universal multilingual translation model can be incrementally updated when new language pairs arrive. Specifically, the initial vocabulary only covers some of the words in new languages, which hurts the translation quality for incremental learning. Although existing approaches attempt to address this issue by replacing the original vocabulary with a rebuilt vocabulary or constructing independent language-specific vocabularies, these methods can not meet the following three demands simultaneously: (1) High translation quality for original and incremental languages, (2) low cost for model training, (3) low time overhead for preprocessing. In this work, we propose an entropy-based vocabulary substitution (EVS) method that just needs to walk through new language pairs for incremental learning in a large-scale multilingual data updating while remaining the size of the vocabulary. Our method has access to learn new knowledge from updated training samples incrementally while keeping high translation quality for original language pairs, alleviating the issue of catastrophic forgetting. Results of experiments show that EVS can achieve better performance and save excess overhead for incremental learning in the multilingual machine translation task.

pdf bib
CGF: Constrained Generation Framework for Query Rewriting in Conversational AI
Jie Hao | Yang Liu | Xing Fan | Saurabh Gupta | Saleh Soltan | Rakesh Chada | Pradeep Natarajan | Chenlei Guo | Gokhan Tur
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

In conversational AI agents, Query Rewriting (QR) plays a crucial role in reducing user frictions and satisfying their daily demands. User frictions are caused by various reasons, such as errors in the conversational AI system, users’ accent or their abridged language. In this work, we present a novel Constrained Generation Framework (CGF) for query rewriting at both global and personalized levels. It is based on the encoder-decoder framework, where the encoder takes the query and its previous dialogue turns as the input to form a context-enhanced representation, and the decoder uses constrained decoding to generate the rewrites based on the pre-defined global or personalized constrained decoding space. Extensive offline and online A/B experiments show that the proposed CGF significantly boosts the query rewriting performance.

pdf bib
VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator
Ayush Shrivastava | Karthik Gopalakrishnan | Yang Liu | Robinson Piramuthu | Gokhan Tur | Devi Parikh | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: ACL 2022

Interactive robots navigating photo-realistic environments need to be trained to effectively leverage and handle the dynamic nature of dialogue in addition to the challenges underlying vision-and-language navigation (VLN). In this paper, we present VISITRON, a multi-modal Transformer-based navigator better suited to the interactive regime inherent to Cooperative Vision-and-Dialog Navigation (CVDN). VISITRON is trained to: i) identify and associate object-level concepts and semantics between the environment and dialogue history, ii) identify when to interact vs. navigate via imitation learning of a binary classification head. We perform extensive pre-training and fine-tuning ablations with VISITRON to gain empirical insights and improve performance on CVDN. VISITRON’s ability to identify when to interact leads to a natural generalization of the game-play mode introduced by Roman et al. (2020) for enabling the use of such models in different environments. VISITRON is competitive with models on the static CVDN leaderboard and attains state-of-the-art performance on the Success weighted by Path Length (SPL) metric.

pdf bib
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation
Sarik Ghazarian | Behnam Hedayatnia | Alexandros Papangelis | Yang Liu | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: ACL 2022

Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to use information that can be automatically extracted from the next user utterance, such as its sentiment or whether the user explicitly ends the conversation, as a proxy to measure the quality of the previous system response. This allows us to train on a massive set of dialogs with weak supervision, without requiring manual system turn quality annotations. Experiments show that our model is comparable to models trained on human annotated data. Furthermore, our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.

pdf bib
Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention
Yifan Chen | Devamanyu Hazarika | Mahdi Namazifar | Yang Liu | Di Jin | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: NAACL 2022

The massive amount of trainable parameters in the pre-trained language models (PLMs) makes them hard to be deployed to multiple downstream tasks. To address this issue, parameter-efficient transfer learning methods have been proposed to tune only a few parameters during fine-tuning while freezing the rest. This paper looks at existing methods along this line through the kernel lens. Motivated by the connection between self-attention in transformer-based PLMs and kernel learning, we propose kernel-wise adapters, namely Kernel-mix, that utilize the kernel structure in self-attention to guide the assignment of the tunable parameters. These adapters use guidelines found in classical kernel learning and enable separate parameter tuning for each attention head. Our empirical results, over a diverse set of natural language generation and understanding tasks, show that our proposed adapters can attain or improve the strong performance of existing baselines.

pdf bib
Analyzing the Limits of Self-Supervision in Handling Bias in Language
Lisa Bauer | Karthik Gopalakrishnan | Spandana Gella | Yang Liu | Mohit Bansal | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: EMNLP 2022

Prompting inputs with natural language task descriptions has emerged as a popular mechanism to elicit reasonably accurate outputs from large-scale generative language models with little to no in-context supervision. This also helps gain insight into how well language models capture the semantics of a wide range of downstream tasks purely from self-supervised pre-training on massive corpora of unlabeled text. Such models have naturally also been exposed to a lot of undesirable content like racist and sexist language and there is only some work on awareness of models along these dimensions. In this paper, we define and comprehensively evaluate how well such language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing. We define three broad classes of task descriptions for these tasks: statement, question, and completion, with numerous lexical variants within each class. We study the efficacy of prompting for each task using these classes and the null task description across several decoding methods and few-shot examples. Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation. We believe our work is an important step towards unbiased language models by quantifying the limits of current self-supervision objectives at accomplishing such sociologically challenging tasks.

pdf bib
Enhancing Knowledge Selection for Grounded Dialogues via Document Semantic Graphs
Sha Li | Mahdi Namazifar | Di Jin | Mohit Bansal | Heng Ji | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Providing conversation models with background knowledge has been shown to make open-domain dialogues more informative and engaging. Existing models treat knowledge selection as a sentence ranking or classification problem where each sentence is handled individually, ignoring the internal semantic connection between sentences. In this work, we propose to automatically convert the background knowledge documents into document semantic graphs and then perform knowledge selection over such graphs. Our document semantic graphs preserve sentence-level information through the use of sentence nodes and provide concept connections between sentences. We apply multi-task learning to perform sentence-level knowledge selection and concept-level knowledge selection, showing that it improves sentence-level selection. Our experiments show that our semantic graph-based knowledge selection improves over sentence selection baselines for both the knowledge selection task and the end-to-end response generation task on HollE and improves generalization on unseen topics in WoW.

pdf bib
Overcoming Catastrophic Forgetting During Domain Adaptation of Seq2seq Language Generation
Dingcheng Li | Zheng Chen | Eunah Cho | Jie Hao | Xiaohu Liu | Fan Xing | Chenlei Guo | Yang Liu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Seq2seq language generation models that are trained offline with multiple domains in a sequential fashion often suffer from catastrophic forgetting. Lifelong learning has been proposed to handle this problem. However, existing work such as experience replay or elastic weighted consolidation requires incremental memory space. In this work, we propose an innovative framework, RMR_DSEthat leverages a recall optimization mechanism to selectively memorize important parameters of previous tasks via regularization, and uses a domain drift estimation algorithm to compensate the drift between different do-mains in the embedding space. These designs enable the model to be trained on the current task while keep-ing the memory of previous tasks, and avoid much additional data storage. Furthermore, RMR_DSE can be combined with existing lifelong learning approaches. Our experiments on two seq2seq language generation tasks, paraphrase and dialog response generation, show thatRMR_DSE outperforms SOTA models by a considerable margin and reduces forgetting greatly.

2021

pdf bib
基于词信息嵌入的汉语构词结构识别研究(Chinese Word-Formation Prediction based on Representations of Word-Related Features)
Hua Zheng (郑婳) | Yaqi Yan (殷雅琦) | Yue Wang (王悦) | Damai Dai (代达劢) | Yang Liu (刘扬)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

作为一种意合型语言,汉语中的构词结构刻画了构词成分之间的组合关系,是认知、理解词义的关键。在中文信息处理领域,此前的构词结构识别工作大多沿用句法层面的粗粒度标签,且主要基于上下文等词间信息建模,忽略了语素义、词义等词内信息对构词结构识别的作用。本文采用语言学视域下的构词结构标签体系,构建汉语构词结构及相关信息数据集,提出了一种基于Bi-LSTM和Self-attention的模型,以此来探究词内、词间等多方面信息对构词结构识别的潜在影响和能达到的性能。实验取得了良好的预测效果,准确率77.87%,F1值78.36%;同时,对比测试揭示,词内的语素义信息对构词结构识别具有显著的贡献,而词间的上下文信息贡献较弱且带有较强的不稳定性。该预测方法与数据集,将为中文信息处理的多种任务,如语素和词结构分析、词义识别与生成、语言文字研究与词典编纂等提供新的观点和方案。

pdf bib
Optimizing NLU Reranking Using Entity Resolution Signals in Multi-domain Dialog Systems
Tong Wang | Jiangning Chen | Mohsen Malmir | Shuyan Dong | Xin He | Han Wang | Chengwei Su | Yue Liu | Yang Liu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

In dialog systems, the Natural Language Understanding (NLU) component typically makes the interpretation decision (including domain, intent and slots) for an utterance before the mentioned entities are resolved. This may result in intent classification and slot tagging errors. In this work, we propose to leverage Entity Resolution (ER) features in NLU reranking and introduce a novel loss term based on ER signals to better learn model weights in the reranking framework. In addition, for a multi-domain dialog scenario, we propose a score distribution matching method to ensure scores generated by the NLU reranking models for different domains are properly calibrated. In offline experiments, we demonstrate our proposed approach significantly outperforms the baseline model on both single-domain and cross-domain evaluations.

pdf bib
Entity Resolution in Open-domain Conversations
Mingyue Shang | Tong Wang | Mihail Eric | Jiangning Chen | Jiyang Wang | Matthew Welch | Tiantong Deng | Akshay Grewal | Han Wang | Yue Liu | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

In recent years, incorporating external knowledge for response generation in open-domain conversation systems has attracted great interest. To improve the relevancy of retrieved knowledge, we propose a neural entity linking (NEL) approach. Different from formal documents, such as news, conversational utterances are informal and multi-turn, which makes it more challenging to disambiguate the entities. Therefore, we present a context-aware named entity recognition model (NER) and entity resolution (ER) model to utilize dialogue context information. We conduct NEL experiments on three open-domain conversation datasets and validate that incorporating context information improves the performance of NER and ER models. The end-to-end NEL approach outperforms the baseline by 62.8% relatively in F1 metric. Furthermore, we verify that using external knowledge based on NEL benefits the neural response generation model.

pdf bib
Learning to Selectively Learn for Weakly-supervised Paraphrase Generation
Kaize Ding | Dingcheng Li | Alexander Hanbo Li | Xing Fan | Chenlei Guo | Yang Liu | Huan Liu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Paraphrase generation is a longstanding NLP task that has diverse applications on downstream NLP tasks. However, the effectiveness of existing efforts predominantly relies on large amounts of golden labeled data. Though unsupervised endeavors have been proposed to alleviate this issue, they may fail to generate meaningful paraphrases due to the lack of supervision signals. In this work, we go beyond the existing paradigms and propose a novel approach to generate high-quality paraphrases with data of weak supervision. Specifically, we tackle the weakly-supervised paraphrase generation problem by: (1) obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion; and (2) developing a meta-learning framework to progressively select valuable samples for fine-tuning a pre-trained language model BART on the sentential paraphrasing task. We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.

pdf bib
Commonsense-Focused Dialogues for Response Generation: An Empirical Study
Pei Zhou | Karthik Gopalakrishnan | Behnam Hedayatnia | Seokhwan Kim | Jay Pujara | Xiang Ren | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Smooth and effective communication requires the ability to perform latent or explicit commonsense inference. Prior commonsense reasoning benchmarks (such as SocialIQA and CommonsenseQA) mainly focus on the discriminative task of choosing the right answer from a set of candidates, and do not involve interactive language generation as in dialogue. Moreover, existing dialogue datasets do not explicitly focus on exhibiting commonsense as a facet. In this paper, we present an empirical study of commonsense in dialogue response generation. We first auto-extract commonsensical dialogues from existing dialogue datasets by leveraging ConceptNet, a commonsense knowledge graph. Furthermore, building on social contexts/situations in SocialIQA, we collect a new dialogue dataset with 25K dialogues aimed at exhibiting social commonsense in an interactive setting. We evaluate response generation models trained using these datasets and find that models trained on both extracted and our collected data produce responses that consistently exhibit more commonsense than baselines. Finally we propose an approach for automatic evaluation of commonsense that relies on features derived from ConceptNet and pre-trained language and dialog models, and show reasonable correlation with human evaluation of responses’ commonsense quality.

pdf bib
Personalized Entity Resolution with Dynamic Heterogeneous KnowledgeGraph Representations
Ying Lin | Han Wang | Jiangning Chen | Tong Wang | Yue Liu | Heng Ji | Yang Liu | Premkumar Natarajan
Proceedings of the 4th Workshop on e-Commerce and NLP

The growing popularity of Virtual Assistants poses new challenges for Entity Resolution, the task of linking mentions in text to their referent entities in a knowledge base. Specifically, in the shopping domain, customers tend to mention the entities implicitly (e.g., “organic milk”) rather than use the entity names explicitly, leading to a large number of candidate products. Meanwhile, for the same query, different customers may expect different results. For example, with “add milk to my cart”, a customer may refer to a certain product from his/her favorite brand, while some customers may want to re-order products they regularly purchase. Moreover, new customers may lack persistent shopping history, which requires us to enrich the connections between customers through products and their attributes. To address these issues, we propose a new framework that leverages personalized features to improve the accuracy of product ranking. We first build a cross-source heterogeneous knowledge graph from customer purchase history and product knowledge graph to jointly learn customer and product embeddings. After that, we incorporate product, customer, and history representations into a neural reranking model to predict which candidate is most likely to be purchased by a specific customer. Experiment results show that our model substantially improves the accuracy of the top ranked candidates by 24.6% compared to the state-of-the-art product search model.

pdf bib
Improving Factual Consistency of Abstractive Summarization on Customer Feedback
Yang Liu | Yifei Sun | Vincent Gao
Proceedings of the 4th Workshop on e-Commerce and NLP

E-commerce stores collect customer feedback to let sellers learn about customer concerns and enhance customer order experience. Because customer feedback often contains redundant information, a concise summary of the feedback can be generated to help sellers better understand the issues causing customer dissatisfaction. Previous state-of-the-art abstractive text summarization models make two major types of factual errors when producing summaries from customer feedback, which are wrong entity detection (WED) and incorrect product-defect description (IPD). In this work, we introduce a set of methods to enhance the factual consistency of abstractive summarization on customer feedback. We augment the training data with artificially corrupted summaries, and use them as counterparts of the target summaries. We add a contrastive loss term into the training objective so that the model learns to avoid certain factual errors. Evaluation results show that a large portion of WED and IPD errors are alleviated for BART and T5. Furthermore, our approaches do not depend on the structure of the summarization model and thus are generalizable to any abstractive summarization systems.

pdf bib
Multi-Sentence Knowledge Selection in Open-Domain Dialogue
Mihail Eric | Nicole Chartier | Behnam Hedayatnia | Karthik Gopalakrishnan | Pankaj Rajan | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 14th International Conference on Natural Language Generation

Incorporating external knowledge sources effectively in conversations is a longstanding problem in open-domain dialogue research. The existing literature on open-domain knowledge selection is limited and makes certain brittle assumptions on knowledge sources to simplify the overall task, such as the existence of a single relevant knowledge sentence per context. In this work, we evaluate the existing state of open-domain conversation knowledge selection, showing where the existing methodologies regarding data and evaluation are flawed. We then improve on them by proposing a new framework for collecting relevant knowledge, and create an augmented dataset based on the Wizard of Wikipedia (WOW) corpus, which we call WOW++. WOW++ averages 8 relevant knowledge sentences per dialogue context, embracing the inherent ambiguity of open-domain dialogue knowledge selection. We then benchmark various knowledge ranking algorithms on this augmented dataset with both intrinsic evaluation and extrinsic measures of response quality, showing that neural rerankers that use WOW++ can outperform rankers trained on standard datasets.

pdf bib
Think Before You Speak: Learning to Generate Implicit Knowledge for Response Generation by Self-Talk
Pei Zhou | Behnam Hedayatnia | Karthik Gopalakrishnan | Seokhwan Kim | Jay Pujara | Xiang Ren | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Humans make appropriate responses not only based on previous dialogue utterances but also on implicit background knowledge such as common sense. Although neural response generation models seem to produce human-like responses, they are mostly end-to-end and not generating intermediate grounds between a dialogue history and responses. This work aims to study if and how we can train an RG model that talks with itself to generate implicit knowledge before making responses. We further investigate can such models identify when to generate implicit background knowledge and when it is not necessary. Experimental results show that compared with models that directly generate responses given a dialogue history, self-talk models produce better-quality responses according to human evaluation on grammaticality, coherence, and engagingness. And models that are trained to identify when to self-talk further improves the response quality. Analysis on generated implicit knowledge shows that models mostly use the knowledge appropriately in the responses.

pdf bib
Towards Zero and Few-shot Knowledge-seeking Turn Detection in Task-orientated Dialogue Systems
Di Jin | Shuyang Gao | Seokhwan Kim | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Most prior work on task-oriented dialogue systems is restricted to supporting domain APIs. However, users may have requests that are out of the scope of these APIs. This work focuses on identifying such user requests. Existing methods for this task mainly rely on fine-tuning pre-trained models on large annotated data. We propose a novel method, REDE, based on adaptive representation learning and density estimation. REDE can be applied to zero-shot cases, and quickly learns a high-performing detector with only a few shots by updating less than 3K parameters. We demonstrate REDE’s competitive performance on DSTC9 data and our newly collected test set.

2020

pdf bib
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Bonnie Webber | Trevor Cohn | Yulan He | Yang Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Incorporating Commonsense Knowledge Graph in Pretrained Models for Social Commonsense Tasks
Ting-Yun Chang | Yang Liu | Karthik Gopalakrishnan | Behnam Hedayatnia | Pei Zhou | Dilek Hakkani-Tur
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

Pretrained language models have excelled at many NLP tasks recently; however, their social intelligence is still unsatisfactory. To enable this, machines need to have a more general understanding of our complicated world and develop the ability to perform commonsense reasoning besides fitting the specific downstream tasks. External commonsense knowledge graphs (KGs), such as ConceptNet, provide rich information about words and their relationships. Thus, towards general commonsense learning, we propose two approaches to implicitly and explicitly infuse such KGs into pretrained language models. We demonstrate our proposed methods perform well on SocialIQA, a social commonsense reasoning task, in both limited and full training data regimes.

pdf bib
Findings of the Association for Computational Linguistics: EMNLP 2020
Trevor Cohn | Yulan He | Yang Liu
Findings of the Association for Computational Linguistics: EMNLP 2020

pdf bib
Policy-Driven Neural Response Generation for Knowledge-Grounded Dialog Systems
Behnam Hedayatnia | Karthik Gopalakrishnan | Seokhwan Kim | Yang Liu | Mihail Eric | Dilek Hakkani-Tur
Proceedings of the 13th International Conference on Natural Language Generation

Open-domain dialog systems aim to generate relevant, informative and engaging responses. In this paper, we propose using a dialog policy to plan the content and style of target, open domain responses in the form of an action plan, which includes knowledge sentences related to the dialog context, targeted dialog acts, topic information, etc. For training, the attributes within the action plan are obtained by automatically annotating the publicly released Topical-Chat dataset. We condition neural response generators on the action plan which is then realized as target utterances at the turn and sentence levels. We also investigate different dialog policy models to predict an action plan given the dialog context. Through automated and human evaluation, we measure the appropriateness of the generated responses and check if the generation models indeed learn to realize the given action plans. We demonstrate that a basic dialog policy that operates at the sentence level generates better responses in comparison to turn level generation as well as baseline models with no action plan. Additionally the basic dialog policy has the added benefit of controllability.

2019

pdf bib
The LAIX Systems in the BEA-2019 GEC Shared Task
Ruobing Li | Chuan Wang | Yefei Zha | Yonghong Yu | Shiman Guo | Qiang Wang | Yang Liu | Hui Lin
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper, we describe two systems we developed for the three tracks we have participated in the BEA-2019 GEC Shared Task. We investigate competitive classification models with bi-directional recurrent neural networks (Bi-RNN) and neural machine translation (NMT) models. For different tracks, we use ensemble systems to selectively combine the NMT models, the classification models, and some rules, and demonstrate that an ensemble solution can effectively improve GEC performance over single systems. Our GEC systems ranked the first in the Unrestricted Track, and the third in both the Restricted Track and the Low Resource Track.

pdf bib
Automated Essay Scoring with Discourse-Aware Neural Models
Farah Nadeem | Huy Nguyen | Yang Liu | Mari Ostendorf
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Automated essay scoring systems typically rely on hand-crafted features to predict essay quality, but such systems are limited by the cost of feature engineering. Neural networks offer an alternative to feature engineering, but they typically require more annotated data. This paper explores network structures, contextualized embeddings and pre-training strategies aimed at capturing discourse characteristics of essays. Experiments on three essay scoring tasks show benefits from all three strategies in different combinations, with simpler architectures being more effective when less training data is available.

2018

pdf bib
A Reinforcement Learning Framework for Natural Question Generation using Bi-discriminators
Zhihao Fan | Zhongyu Wei | Siyuan Wang | Yang Liu | Xuanjing Huang
Proceedings of the 27th International Conference on Computational Linguistics

Visual Question Generation (VQG) aims to ask natural questions about an image automatically. Existing research focus on training model to fit the annotated data set that makes it indifferent from other language generation tasks. We argue that natural questions need to have two specific attributes from the perspectives of content and linguistic respectively, namely, natural and human-written. Inspired by the setting of discriminator in adversarial learning, we propose two discriminators, one for each attribute, to enhance the training. We then use the reinforcement learning framework to incorporate scores from the two discriminators as the reward to guide the training of the question generator. Experimental results on a benchmark VQG dataset show the effectiveness and robustness of our model compared to some state-of-the-art models in terms of both automatic and human evaluation metrics.

pdf bib
Incorporating Argument-Level Interactions for Persuasion Comments Evaluation using Co-attention Model
Lu Ji | Zhongyu Wei | Xiangkun Hu | Yang Liu | Qi Zhang | Xuanjing Huang
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we investigate the issue of persuasiveness evaluation for argumentative comments. Most of the existing research explores different text features of reply comments on word level and ignores interactions between participants. In general, viewpoints are usually expressed by multiple arguments and exchanged on argument level. To better model the process of dialogical argumentation, we propose a novel co-attention mechanism based neural network to capture the interactions between participants on argument level. Experimental results on a publicly available dataset show that the proposed model significantly outperforms some state-of-the-art methods for persuasiveness evaluation. Further analysis reveals that attention weights computed in our model are able to extract interactive argument pairs from the original post and the reply.

pdf bib
Incorporating Topic Aspects for Online Comment Convincingness Evaluation
Yunfan Gu | Zhongyu Wei | Maoran Xu | Hao Fu | Yang Liu | Xuanjing Huang
Proceedings of the 5th Workshop on Argument Mining

In this paper, we propose to incorporate topic aspects information for online comments convincingness evaluation. Our model makes use of graph convolutional network to utilize implicit topic information within a discussion thread to assist the evaluation of convincingness of each single comment. In order to test the effectiveness of our proposed model, we annotate topic information on top of a public dataset for argument convincingness evaluation. Experimental results show that topic information is able to improve the performance for convincingness evaluation. We also make a move to detect topic aspects automatically.

pdf bib
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
Yang Liu | Tim Paek | Manasi Patwardhan
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2017

pdf bib
Using Context Information for Dialog Act Classification in DNN Framework
Yang Liu | Kun Han | Zhao Tan | Yun Lei
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Previous work on dialog act (DA) classification has investigated different methods, such as hidden Markov models, maximum entropy, conditional random fields, graphical models, and support vector machines. A few recent studies explored using deep learning neural networks for DA classification, however, it is not clear yet what is the best method for using dialog context or DA sequential information, and how much gain it brings. This paper proposes several ways of using context information for DA classification, all in the deep learning framework. The baseline system classifies each utterance using the convolutional neural networks (CNN). Our proposed methods include using hierarchical models (recurrent neural networks (RNN) or CNN) for DA sequence tagging where the bottom layer takes the sentence CNN representation as input, concatenating predictions from the previous utterances with the CNN vector for classification, and performing sequence decoding based on the predictions from the sentence CNN model. We conduct thorough experiments and comparisons on the Switchboard corpus, demonstrate that incorporating context information significantly improves DA classification, and show that we achieve new state-of-the-art performance for this task.

pdf bib
A non-DNN Feature Engineering Approach to Dependency Parsing – FBAML at CoNLL 2017 Shared Task
Xian Qian | Yang Liu
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

For this year’s multilingual dependency parsing shared task, we developed a pipeline system, which uses a variety of features for each of its components. Unlike the recent popular deep learning approaches that learn low dimensional dense features using non-linear classifier, our system uses structured linear classifiers to learn millions of sparse features. Specifically, we trained a linear classifier for sentence boundary prediction, linear chain conditional random fields (CRFs) for tokenization, part-of-speech tagging and morph analysis. A second order graph based parser learns the tree structure (without relations), and fa linear tree CRF then assigns relations to the dependencies in the tree. Our system achieves reasonable performance – 67.87% official averaged macro F1 score

2016

pdf bib
Using Relevant Public Posts to Enhance News Article Summarization
Chen Li | Zhongyu Wei | Yang Liu | Yang Jin | Fei Huang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

A news article summary usually consists of 2-3 key sentences that reflect the gist of that news article. In this paper we explore using public posts following a new article to improve automatic summary generation for the news article. We propose different approaches to incorporate information from public posts, including using frequency information from the posts to re-estimate bigram weights in the ILP-based summarization model and to re-weight a dependency tree edge’s importance for sentence compression, directly selecting sentences from posts as the final summary, and finally a strategy to combine the summarization results generated from news articles and posts. Our experiments on data collected from Facebook show that relevant public posts provide useful information and can be effectively leveraged to improve news article summarization results.

pdf bib
A Preliminary Study of Disputation Behavior in Online Debating Forum
Zhongyu Wei | Yandi Xia | Chen Li | Yang Liu | Zachary Stallbohm | Yi Li | Yang Jin
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
Is This Post Persuasive? Ranking Argumentative Comments in Online Forum
Zhongyu Wei | Yang Liu | Yi Li
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
An Efficient Cross-lingual Model for Sentence Classification Using Convolutional Neural Network
Yandi Xia | Zhongyu Wei | Yang Liu
Proceedings of the ACL 2016 Student Research Workshop

2015

pdf bib
Improving Named Entity Recognition in Tweets via Detecting Non-Standard Words
Chen Li | Yang Liu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Feature Selection in Kernel Space: A Case Study on Dependency Parsing
Xian Qian | Yang Liu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Using Tweets to Help Sentence Compression for News Highlights Generation
Zhongyu Wei | Yang Liu | Chen Li | Wei Gao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Using External Resources and Joint Learning for Bigram Weighting in ILP-Based Multi-Document Summarization
Chen Li | Yang Liu | Lin Zhao
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improving Update Summarization via Supervised ILP and Sentence Reranking
Chen Li | Yang Liu | Lin Zhao
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts
Yang Liu | Thamar Solorio
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

2014

pdf bib
2-Slave Dual Decomposition for Generalized Higher Order CRFs
Xian Qian | Yang Liu
Transactions of the Association for Computational Linguistics, Volume 2

We show that the decoding problem in generalized Higher Order Conditional Random Fields (CRFs) can be decomposed into two parts: one is a tree labeling problem that can be solved in linear time using dynamic programming; the other is a supermodular quadratic pseudo-Boolean maximization problem, which can be solved in cubic time using a minimum cut algorithm. We use dual decomposition to force their agreement. Experimental results on Twitter named entity recognition and sentence dependency tagging tasks show that our method outperforms spanning tree based dual decomposition.

pdf bib
Polynomial Time Joint Structural Inference for Sentence Compression
Xian Qian | Yang Liu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Improving Text Normalization via Unsupervised Model and Discriminative Reranking
Chen Li | Yang Liu
Proceedings of the ACL 2014 Student Research Workshop

pdf bib
Improving Multi-documents Summarization by Sentence Compression based on Expanded Constituent Parse Trees
Chen Li | Yang Liu | Fei Liu | Lin Zhao | Fuliang Weng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Using Supervised Bigram-based ILP for Extractive Summarization
Chen Li | Xian Qian | Yang Liu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Branch and Bound Algorithm for Dependency Parsing with Non-local Features
Xian Qian | Yang Liu
Transactions of the Association for Computational Linguistics, Volume 1

Graph based dependency parsing is inefficient when handling non-local features due to high computational complexity of inference. In this paper, we proposed an exact and efficient decoding algorithm based on the Branch and Bound (B&B) framework where non-local features are bounded by a linear combination of local features. Dynamic programming is used to search the upper bound. Experiments are conducted on English PTB and Chinese CTB datasets. We achieved competitive Unlabeled Attachment Score (UAS) when no additional resources are available: 93.17% for English and 87.25% for Chinese. Parsing speed is 177 words per second for English and 97 words per second for Chinese. Our algorithm is general and can be adapted to non-projective dependency parsing or other graphical models.

pdf bib
Simple Yet Powerful Native Language Identification on TOEFL11
Ching-Yi Wu | Po-Hsiang Lai | Yang Liu | Vincent Ng
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Exploring Word Class N-grams to Measure Language Development in Children
Gabriela Ramírez de la Rosa | Thamar Solorio | Manuel Montes | Yang Liu | Lisa Bedore | Elizabeth Peña | Aquiles Iglesias
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

pdf bib
Using Latent Dirichlet Allocation for Child Narrative Analysis
Khairun-nisa Hassanali | Yang Liu | Thamar Solorio
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

pdf bib
Document Summarization via Guided Sentence Compression
Chen Li | Fei Liu | Fuliang Weng | Yang Liu
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Fast Joint Compression and Summarization via Graph Cuts
Xian Qian | Yang Liu
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Disfluency Detection Using Multi-step Stacked Learning
Xian Qian | Yang Liu
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Joint Chinese Word Segmentation, POS Tagging and Parsing
Xian Qian | Yang Liu
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Improving Text Normalization using Character-Blocks Based Models and System Combination
Chen Li | Yang Liu
Proceedings of COLING 2012

pdf bib
User Participation Prediction in Online Forums
Zhonghua Qu | Yang Liu
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Sentence Dependency Tagging in Online Question Answering Forums
Zhonghua Qu | Yang Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Two-step Approach to Sentence Compression of Spoken Utterances
Dong Wang | Xian Qian | Yang Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
A Character-Level Machine Translation Approach for Normalization of SMS Abbreviations
Deana Pennell | Yang Liu
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Learning from Chinese-English Parallel Data for Chinese Tense Prediction
Feifan Liu | Fei Liu | Yang Liu
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Finding Problem Solving Threads in Online Forum
Zhonghua Qu | Yang Liu
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages
Ani Nenkova | Julia Hirschberg | Yang Liu
Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages

pdf bib
Why is “SXSW” trending? Exploring Multiple Text Sources for Twitter Topic Summarization
Fei Liu | Yang Liu | Fuliang Weng
Proceedings of the Workshop on Language in Social Media (LSM 2011)

pdf bib
Measuring Language Development in Early Childhood Education: A Case Study of Grammar Checking in Child Language Transcripts
Khairun-nisa Hassanali | Yang Liu
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
A Cross-corpus Study of Unsupervised Subjectivity Identification based on Calibrated EM
Dong Wang | Yang Liu
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

pdf bib
A Pilot Study of Opinion Summarization in Conversations
Dong Wang | Yang Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
N-Best Rescoring Based on Pitch-accent Patterns
Je Hun Jeon | Wen Wang | Yang Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Insertion, Deletion, or Substitution? Normalizing Text Messages without Pre-categorization nor Supervision
Fei Liu | Fuliang Weng | Bingqing Wang | Yang Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Interactive Group Suggesting for Twitter
Zhonghua Qu | Yang Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Automatic Summarization
Ani Nenkova | Sameer Maskey | Yang Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2010

pdf bib
Using Confusion Networks for Speech Summarization
Shasha Xie | Yang Liu
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Improving Blog Polarity Classification via Topic Analysis and Adaptive Methods
Feifan Liu | Dong Wang | Bin Li | Yang Liu
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Coling 2010: Demonstrations
Yang Liu | Ting Liu
Coling 2010: Demonstrations

pdf bib
Non-Expert Evaluation of Summarization Systems is Risky
Dan Gillick | Yang Liu
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

2009

pdf bib
A Corpus-Based Approach for the Prediction of Language Impairment in Monolingual English and Spanish-English Bilingual Children
Keyur Gabani | Melissa Sherman | Thamar Solorio | Yang Liu | Lisa Bedore | Elizabeth Peña
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts
Feifan Liu | Deana Pennell | Fei Liu | Yang Liu
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm
Je Hun Jeon | Yang Liu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
From Extractive to Abstractive Meeting Summaries: Can It Be Done by Sentence Compression?
Fei Liu | Yang Liu
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
Learning to Predict Code-Switching Points
Thamar Solorio | Yang Liu
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Part-of-Speech Tagging for English-Spanish Code-Switched Text
Thamar Solorio | Yang Liu
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
What Are Meeting Summaries? An Analysis of Human Extractive Summaries in Meeting Corpus
Fei Liu | Yang Liu
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

pdf bib
Using Language Models to Identify Language Impairment in Spanish-English Bilingual Children
Thamar Solorio | Yang Liu
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf bib
Correlation between ROUGE and Human Evaluation of Extractive Meeting Summaries
Feifan Liu | Yang Liu
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
Unsupervised Language Model Adaptation Incorporating Named Entity Information
Feifan Liu | Yang Liu
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Look Who is Talking: Soundbite Speaker Name Recognition in Broadcast News Speech
Feifan Liu | Yang Liu
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

2006

pdf bib
Off-Topic Detection in Conversational Telephone Speech
Robin Stewart | Andrea Danyluk | Yang Liu
Proceedings of the Analyzing Conversations in Text and Speech

pdf bib
PCFGs with Syntactic and Prosodic Indicators of Speech Repairs
John Hale | Izhak Shafran | Lisa Yung | Bonnie J. Dorr | Mary Harper | Anna Krasnyanskaya | Matthew Lease | Yang Liu | Brian Roark | Matthew Snover | Robin Stewart
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Initial Study on Automatic Identification of Speaker Role in Broadcast News Speech
Yang Liu
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
SParseval: Evaluation Metrics for Parsing Speech
Brian Roark | Mary Harper | Eugene Charniak | Bonnie Dorr | Mark Johnson | Jeremy Kahn | Yang Liu | Mari Ostendorf | John Hale | Anna Krasnyanskaya | Matthew Lease | Izhak Shafran | Matthew Snover | Robin Stewart | Lisa Yung
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

While both spoken and written language processing stand to benefit from parsing, the standard Parseval metrics (Black et al., 1991) and their canonical implementation (Sekine and Collins, 1997) are only useful for text. The Parseval metrics are undefined when the words input to the parser do not match the words in the gold standard parse tree exactly, and word errors are unavoidable with automatic speech recognition (ASR) systems. To fill this gap, we have developed a publicly available tool for scoring parses that implements a variety of metrics which can handle mismatches in words and segmentations, including: alignment-based bracket evaluation, alignment-based dependency evaluation, and a dependency evaluation that does not require alignment. We describe the different metrics, how to use the tool, and the outcome of an extensive set of experiments on the sensitivity.

pdf bib
Linguistic Resources for Speech Parsing
Ann Bies | Stephanie Strassel | Haejoong Lee | Kazuaki Maeda | Seth Kulick | Yang Liu | Mary Harper | Matthew Lease
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We report on the success of a two-pass approach to annotating metadata, speech effects and syntactic structure in English conversational speech: separately annotating transcribed speech for structural metadata, or structural events, (fillers, speech repairs ( or edit dysfluencies) and SUs, or syntactic/semantic units) and for syntactic structure (treebanking constituent structure and shallow argument structure). The two annotations were then combined into a single representation. Certain alignment issues between the two types of annotation led to the discovery and correction of annotation errors in each, resulting in a more accurate and useful resource. The development of this corpus was motivated by the need to have both metadata and syntactic structure annotated in order to support synergistic work on speech parsing and structural event detection. Automatic detection of these speech phenomena would simultaneously improve parsing accuracy and provide a mechanism for cleaning up transcriptions for downstream text processing. Similarly, constraints imposed by text processing systems such as parsers can be used to help improve identification of disfluencies and sentence boundaries. This paper reports on our efforts to develop a linguistic resource providing both spoken metadata and syntactic structure information, and describes the resulting corpus of English conversational speech.

2005

pdf bib
Using Conditional Random Fields for Sentence Boundary Detection in Speech
Yang Liu | Andreas Stolcke | Elizabeth Shriberg | Mary Harper
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Evaluating Factors Impacting the Accuracy of Forced Alignments in a Multimodal Corpus
Lei Chen | Yang Liu | Mary Harper | Eduardo Maia | Susan McRoy
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

People, when processing human-to-human communication, utilize everything they can in order to understand that communication, including speech and information such as the time and location of an interlocutor's gesture and gaze. Speech and gesture are known to exhibit a synchronous relationship in human communication; however, the precise nature of that relationship requires further investigation. The construction of computer models of multimodal human communication would be enabled by the availability of multimodal communication corpora annotated with synchronized gesture and speech features. To investigate the temporal relationships of these knowledge sources, we have collected and are annotating several multimodal corpora with time-aligned features. Forced alignment between a speech file and its transcription is a crucial part of multimodal corpus production. This paper investigates a number of factors that may contribute to highly accurate forced alignments to support the rapid production of these multimodal corpora including the acoustic model, the match between the speech used for training the system and that to be force aligned, the amount of data used to train the ASR system, the availability of speaker adaptation, and the duration of alignment segments.

pdf bib
Comparing and Combining Generative and Posterior Probability Models: Some Advances in Sentence Boundary Detection in Speech
Yang Liu | Andreas Stolcke | Elizabeth Shriberg | Mary Harper
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
Improving Automatic Sentence Boundary Detection with Confusion Networks
D. Hillard | M. Ostendorf | A. Stolcke | Y. Liu | E. Shriberg
Proceedings of HLT-NAACL 2004: Short Papers

2003

pdf bib
Word Fragments Identification Using Acoustic-Prosodic Features in Conversational Speech
Yang Liu
Proceedings of the HLT-NAACL 2003 Student Research Workshop

Search
Co-authors