Fei Mi


pdf bib
Continual Prompt Tuning for Dialog State Tracking
Qi Zhu | Bing Li | Fei Mi | Xiaoyan Zhu | Minlie Huang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A desirable dialog system should be able to continually learn new skills without forgetting old ones, and thereby adapt to new domains or tasks in its life cycle. However, continually training a model often leads to a well-known catastrophic forgetting issue. In this paper, we present Continual Prompt Tuning, a parameter-efficient framework that not only avoids forgetting but also enables knowledge transfer between tasks. To avoid forgetting, we only learn and store a few prompt tokens’ embeddings for each task while freezing the backbone pre-trained model. To achieve bi-directional knowledge transfer among tasks, we propose several techniques (continual prompt initialization, query fusion, and memory replay) to transfer knowledge from preceding tasks and a memory-guided technique to transfer knowledge from subsequent tasks. Extensive experiments demonstrate the effectiveness and efficiency of our proposed method on continual learning for dialog state tracking, compared with state-of-the-art baselines.

pdf bib
UniDS: A Unified Dialogue System for Chit-Chat and Task-oriented Dialogues
Xinyan Zhao | Bin He | Yasheng Wang | Yitong Li | Fei Mi | Yajiao Liu | Xin Jiang | Qun Liu | Huanhuan Chen
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

With the advances in deep learning, tremendous progress has been made with chit-chat dialogue systems and task-oriented dialogue systems. However, these two systems are often tackled separately in current methods. To achieve more natural interaction with humans, dialogue systems need to be capable of both chatting and accomplishing tasks. To this end, we propose a unified dialogue system (UniDS) with the two aforementioned skills. In particular, we design a unified dialogue data schema, compatible for both chit-chat and task-oriented dialogues. Besides, we propose a two-stage training method to train UniDS based on the unified dialogue data schema. UniDS does not need to adding extra parameters to existing chit-chat dialogue systems. Experimental results demonstrate that the proposed UniDS works comparably well as the state-of-the-art chit-chat dialogue systems and task-oriented dialogue systems. More importantly, UniDS achieves better robustness than pure dialogue systems and satisfactory switch ability between two types of dialogues.

pdf bib
Compilable Neural Code Generation with Compiler Feedback
Xin Wang | Yasheng Wang | Yao Wan | Fei Mi | Yitong Li | Pingyi Zhou | Jin Liu | Hao Wu | Xin Jiang | Qun Liu
Findings of the Association for Computational Linguistics: ACL 2022

Automatically generating compilable programs with (or without) natural language descriptions has always been a touchstone problem for computational linguistics and automated software engineering. Existing deep-learning approaches model code generation as text generation, either constrained by grammar structures in decoder, or driven by pre-trained language models on large-scale code corpus (e.g., CodeGPT, PLBART, and CodeT5). However, few of them account for compilability of the generated programs. To improve compilability of the generated programs, this paper proposes COMPCODER, a three-stage pipeline utilizing compiler feedback for compilable code generation, including language model fine-tuning, compilability reinforcement, and compilability discrimination. Comprehensive experiments on two code generation tasks demonstrate the effectiveness of our proposed approach, improving the success rate of compilation from 44.18 to 89.18 in code completion on average and from 70.3 to 96.2 in text-to-code generation, respectively, when comparing with the state-of-the-art CodeGPT.


pdf bib
Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems
Fei Mi | Wanhao Zhou | Lingjing Kong | Fengyu Cai | Minlie Huang | Boi Faltings
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

As the labeling cost for different modules in task-oriented dialog (ToD) systems is expensive, a major challenge is to train different modules with the least amount of labeled data. Recently, large-scale pre-trained language models, have shown promising results for few-shot learning in ToD. In this paper, we devise a self-training approach to utilize the abundant unlabeled dialog data to further improve state-of-the-art pre-trained models in few-shot learning scenarios for ToD systems. Specifically, we propose a self-training approach that iteratively labels the most confident unlabeled data to train a stronger Student model. Moreover, a new text augmentation technique (GradAug) is proposed to better train the Student by replacing non-crucial tokens using a masked language model. We conduct extensive experiments and present analyses on four downstream tasks in ToD, including intent classification, dialog state tracking, dialog act prediction, and response selection. Empirical results demonstrate that the proposed self-training approach consistently improves state-of-the-art pre-trained models (BERT, ToD-BERT) when only a small number of labeled data are available.


pdf bib
Continual Learning for Natural Language Generation in Task-oriented Dialog Systems
Fei Mi | Liangwei Chen | Mengjie Zhao | Minlie Huang | Boi Faltings
Findings of the Association for Computational Linguistics: EMNLP 2020

Natural language generation (NLG) is an essential component of task-oriented dialog systems. Despite the recent success of neural approaches for NLG, they are typically developed in an offline manner for particular domains. To better fit real-life applications where new data come in a stream, we study NLG in a “continual learning” setting to expand its knowledge to new domains or functionalities incrementally. The major challenge towards this goal is catastrophic forgetting, meaning that a continually trained model tends to forget the knowledge it has learned before. To this end, we propose a method called ARPER (Adaptively Regularized Prioritized Exemplar Replay) by replaying prioritized historical exemplars, together with an adaptive regularization technique based on Elastic Weight Consolidation. Extensive experiments to continually learn new domains and intents are conducted on MultiWoZ-2.0 to benchmark ARPER with a wide range of techniques. Empirical results demonstrate that ARPER significantly outperforms other methods by effectively mitigating the detrimental catastrophic forgetting issue.

pdf bib
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
Mengjie Zhao | Tao Lin | Fei Mi | Martin Jaggi | Hinrich Schütze
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT, RoBERTa, and DistilBERT on eleven diverse NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred. Intrinsic evaluations show that representations computed by our binary masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.