Kun Qian


2022

pdf bib
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu | Zhenzhong Lan | Kun Qian | Jing Gu | Alborz Geramifard | Zhou Yu
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022

Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network for sequence modeling, that utilizes an external dynamic memory to encode and retrieve past information. Our model achieves linear time complexity and constant memory space complexity when processing long sequences. We also propose a new optimization scheme, memory replay back-propagation (MRBP), which promotes long-range back-propagation through time with a significantly reduced memory requirement. Experimental results show that Memformer has achieved comparable performance compared against the baselines by using 8.1x less memory space and 3.2x faster on inference. Analysis of the attention pattern shows that our external memory slots can encode and retain important information through timesteps.

pdf bib
Database Search Results Disambiguation for Task-Oriented Dialog Systems
Kun Qian | Satwik Kottur | Ahmad Beirami | Shahin Shayandeh | Paul Crook | Alborz Geramifard | Zhou Yu | Chinnadhurai Sankar
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

As task-oriented dialog systems are becoming increasingly popular in our lives, more realistic tasks have been proposed and explored. However, new practical challenges arise. For instance, current dialog systems cannot effectively handle multiplesearch results when querying a database, due to the lack of such scenarios in existing public datasets. In this paper, we propose Database Search Result (DSR) Disambiguation, a novel task that focuses on disambiguating database search results, which enhances user experience by allowing them to choose from multiple options instead of just one. To study this task, we augment the popular task-oriented dialog datasets (MultiWOZ and SGD) with turns that resolve ambiguities by (a) synthetically generating turns through a pre-defined grammar, and (b) collecting human paraphrases for a subset. We find that training on our augmented dialog data improves the model’s ability to deal with ambiguous scenarios, without sacrificing performance on unmodified turns. Furthermore, pre-fine tuning and multi-task learning help our model to improve performance on DSR-disambiguation even in the absence of in-domain data, suggesting that it can be learned as a universal dialog skill. Our data and code will be made publicly available.

2021

pdf bib
Annotation Inconsistency and Entity Bias in MultiWOZ
Kun Qian | Ahmad Beirami | Zhouhan Lin | Ankita De | Alborz Geramifard | Zhou Yu | Chinnadhurai Sankar
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

MultiWOZ (Budzianowski et al., 2018) is one of the most popular multi-domain taskoriented dialog datasets, containing 10K+ annotated dialogs covering eight domains. It has been widely accepted as a benchmark for various dialog tasks, e.g., dialog state tracking (DST), natural language generation (NLG) and end-to-end (E2E) dialog modeling. In this work, we identify an overlooked issue with dialog state annotation inconsistencies in the dataset, where a slot type is tagged inconsistently across similar dialogs leading to confusion for DST modeling. We propose an automated correction for this issue, which is present in 70% of the dialogs. Additionally, we notice that there is significant entity bias in the dataset (e.g., “cambridge” appears in 50% of the destination cities in the train domain). The entity bias can potentially lead to named entity memorization in generative models, which may go unnoticed as the test set suffers from a similar entity bias as well. We release a new test set with all entities replaced with unseen entities. Finally, we benchmark joint goal accuracy (JGA) of the state-of-theart DST baselines on these modified versions of the data. Our experiments show that the annotation inconsistency corrections lead to 7-10% improvement in JGA. On the other hand, we observe a 29% drop in JGA when models are evaluated on the new test set with unseen entities.

2020

pdf bib
Learning Structured Representations of Entity Names using Active Learning and Weak Supervision
Kun Qian | Poornima Chozhiyath Raman | Yunyao Li | Lucian Popa
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Structured representations of entity names are useful for many entity-related tasks such as entity normalization and variant generation. Learning the implicit structured representations of entity names without context and external knowledge is particularly challenging. In this paper, we present a novel learning framework that combines active learning and weak supervision to solve this problem. Our experimental evaluation show that this framework enables the learning of high-quality models from merely a dozen or so labeled examples.

pdf bib
A Survey of the State of Explainable AI for Natural Language Processing
Marina Danilevsky | Kun Qian | Ranit Aharonov | Yannis Katsis | Ban Kawas | Prithviraj Sen
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.

pdf bib
Answering Complex Questions by Combining Information from Curated and Extracted Knowledge Bases
Nikita Bhutani | Xinyi Zheng | Kun Qian | Yunyao Li | H. Jagadish
Proceedings of the First Workshop on Natural Language Interfaces

Knowledge-based question answering (KB_QA) has long focused on simple questions that can be answered from a single knowledge source, a manually curated or an automatically extracted KB. In this work, we look at answering complex questions which often require combining information from multiple sources. We present a novel KB-QA system, Multique, which can map a complex question to a complex query pattern using a sequence of simple queries each targeted at a specific KB. It finds simple queries using a neural-network based model capable of collective inference over textual relations in extracted KB and ontological relations in curated KB. Experiments show that our proposed system outperforms previous KB-QA systems on benchmark datasets, ComplexWebQuestions and WebQuestionsSP.

2019

pdf bib
How to Build User Simulators to Train RL-based Dialog Systems
Weiyan Shi | Kun Qian | Xuewei Wang | Zhou Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

User simulators are essential for training reinforcement learning (RL) based dialog models. The performance of the simulator directly impacts the RL policy. However, building a good user simulator that models real user behaviors is challenging. We propose a method of standardizing user simulator building that can be used by the community to compare dialog system quality using the same set of user simulators fairly. We present implementations of six user simulators trained with different dialog planning and generation methods. We then calculate a set of automatic metrics to evaluate the quality of these simulators both directly and indirectly. We also ask human users to assess the simulators directly and indirectly by rating the simulated dialogs and interacting with the trained systems. This paper presents a comprehensive evaluation framework for user simulator study and provides a better understanding of the pros and cons of different user simulators, as well as their impacts on the trained systems.

pdf bib
Domain Adaptive Dialog Generation via Meta Learning
Kun Qian | Zhou Yu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Domain adaptation is an essential task in dialog system building because there are so many new dialog tasks created for different needs every day. Collecting and annotating training data for these new tasks is costly since it involves real user interactions. We propose a domain adaptive dialog generation method based on meta-learning (DAML). DAML is an end-to-end trainable dialog system model that learns from multiple rich-resource tasks and then adapts to new domains with minimal training samples. We train a dialog system model using multiple rich-resource single-domain dialog data by applying the model-agnostic meta-learning algorithm to dialog domain. The model is capable of learning a competitive dialog system on a new domain with only a few training examples in an efficient manner. The two-step gradient updates in DAML enable the model to learn general features across multiple tasks. We evaluate our method on a simulated dialog dataset and achieve state-of-the-art performance, which is generalizable to new tasks.

pdf bib
Low-resource Deep Entity Resolution with Transfer and Active Learning
Jungo Kasai | Kun Qian | Sairam Gurajada | Yunyao Li | Lucian Popa
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Entity resolution (ER) is the task of identifying different representations of the same real-world entities across databases. It is a key step for knowledge base creation and text mining. Recent adaptation of deep learning methods for ER mitigates the need for dataset-specific feature engineering by constructing distributed representations of entity records. While these methods achieve state-of-the-art performance over benchmark data, they require large amounts of labeled data, which are typically unavailable in realistic ER applications. In this paper, we develop a deep learning-based method that targets low-resource settings for ER through a novel combination of transfer learning and active learning. We design an architecture that allows us to learn a transferable model from a high-resource setting to a low-resource one. To further adapt to the target dataset, we incorporate active learning that carefully selects a few informative examples to fine-tune the transferred model. Empirical evaluation demonstrates that our method achieves comparable, if not better, performance compared to state-of-the-art learning-based methods while using an order of magnitude fewer labels.

2018

pdf bib
Exploiting Structure in Representation of Named Entities using Active Learning
Nikita Bhutani | Kun Qian | Yunyao Li | H. V. Jagadish | Mauricio Hernandez | Mitesh Vasa
Proceedings of the 27th International Conference on Computational Linguistics

Fundamental to several knowledge-centric applications is the need to identify named entities from their textual mentions. However, entities lack a unique representation and their mentions can differ greatly. These variations arise in complex ways that cannot be captured using textual similarity metrics. However, entities have underlying structures, typically shared by entities of the same entity type, that can help reason over their name variations. Discovering, learning and manipulating these structures typically requires high manual effort in the form of large amounts of labeled training data and handwritten transformation programs. In this work, we propose an active-learning based framework that drastically reduces the labeled data required to learn the structures of entities. We show that programs for mapping entity mentions to their structures can be automatically generated using human-comprehensible labels. Our experiments show that our framework consistently outperforms both handwritten programs and supervised learning models. We also demonstrate the utility of our framework in relation extraction and entity resolution tasks.