Ming Tan


pdf bib
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization
Zheng Li | Zijian Wang | Ming Tan | Ramesh Nallapati | Parminder Bhatia | Andrew Arnold | Bing Xiang | Dan Roth
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. However, such models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. To alleviate this issue, we propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision teacher model to the quantized and distilled low-precision student model. Empirical analyses show that, despite the challenging nature of generative tasks, we were able to achieve a 16.5x model footprint compression ratio with little performance drop relative to the full-precision counterparts on multiple summarization and QA datasets. We further pushed the limit of compression ratio to 27.7x and presented the performance-efficiency trade-off for generative tasks using pre-trained models. To the best of our knowledge, this is the first work aiming to effectively distill and quantize sequence-to-sequence pre-trained models for language generation tasks.


pdf bib
Generating Synthetic Data for Task-Oriented Semantic Parsing with Hierarchical Representations
Ke Tran | Ming Tan
Proceedings of the Fourth Workshop on Structured Prediction for NLP

Modern conversational AI systems support natural language understanding for a wide variety of capabilities. While a majority of these tasks can be accomplished using a simple and flat representation of intents and slots, more sophisticated capabilities require complex hierarchical representations supported by semantic parsing. State-of-the-art semantic parsers are trained using supervised learning with data labeled according to a hierarchical schema which might be costly to obtain or not readily available for a new domain. In this work, we explore the possibility of generating synthetic data for neural semantic parsing using a pretrained denoising sequence-to-sequence model (i.e., BART). Specifically, we first extract masked templates from the existing labeled utterances, and then fine-tune BART to generate synthetic utterances conditioning on the extracted templates. Finally, we use an auxiliary parser (AP) to filter the generated utterances. The AP guarantees the quality of the generated data. We show the potential of our approach when evaluating on the Facebook TOP dataset for navigation domain.


pdf bib
Out-of-Domain Detection for Low-Resource Text Classification Tasks
Ming Tan | Yang Yu | Haoyu Wang | Dakuo Wang | Saloni Potdar | Shiyu Chang | Mo Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Out-of-domain (OOD) detection for low-resource text classification is a realistic but understudied task. The goal is to detect the OOD cases with limited in-domain (ID) training data, since in machine learning applications we observe that training data is often insufficient. In this work, we propose an OOD-resistant Prototypical Network to tackle this zero-shot OOD detection and few-shot ID classification task. Evaluations on real-world datasets show that the proposed solution outperforms state-of-the-art methods in zero-shot OOD detection task, while maintaining a competitive performance on ID classification task.

pdf bib
Context-Aware Conversation Thread Detection in Multi-Party Chat
Ming Tan | Dakuo Wang | Yupeng Gao | Haoyu Wang | Saloni Potdar | Xiaoxiao Guo | Shiyu Chang | Mo Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In multi-party chat, it is common for multiple conversations to occur concurrently, leading to intermingled conversation threads in chat logs. In this work, we propose a novel Context-Aware Thread Detection (CATD) model that automatically disentangles these conversation threads. We evaluate our model on four real-world datasets and demonstrate an overall im-provement in thread detection accuracy over state-of-the-art benchmarks.

pdf bib
Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers
Haoyu Wang | Ming Tan | Mo Yu | Shiyu Chang | Dakuo Wang | Kun Xu | Xiaoxiao Guo | Saloni Potdar
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Many approaches to extract multiple relations from a paragraph require multiple passes over the paragraph. In practice, multiple passes are computationally expensive and this makes difficult to scale to longer paragraphs and larger text corpora. In this work, we focus on the task of multiple relation extractions by encoding the paragraph only once. We build our solution upon the pre-trained self-attentive models (Transformer), where we first add a structured prediction layer to handle extraction between multiple entity pairs, then enhance the paragraph embedding to capture multiple relational information associated with each entity with entity-aware attention. We show that our approach is not only scalable but can also perform state-of-the-art on the standard benchmark ACE 2005.


pdf bib
Improved Representation Learning for Question Answer Matching
Ming Tan | Cicero dos Santos | Bing Xiang | Bowen Zhou
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
FastHybrid: A Hybrid Model for Efficient Answer Selection
Lidan Wang | Ming Tan | Jiawei Han
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Answer selection is a core component in any question-answering systems. It aims to select correct answer sentences for a given question from a pool of candidate sentences. In recent years, many deep learning methods have been proposed and shown excellent results for this task. However, these methods typically require extensive parameter (and hyper-parameter) tuning, which give rise to efficiency issues for large-scale datasets, and potentially make them less portable across new datasets and domains (as re-tuning is usually required). In this paper, we propose an extremely efficient hybrid model (FastHybrid) that tackles the problem from both an accuracy and scalability point of view. FastHybrid is a light-weight model that requires little tuning and adaptation across different domains. It combines a fast deep model (which will be introduced in the method section) with an initial information retrieval model to effectively and efficiently handle answer selection. We introduce a new efficient attention mechanism in the hybrid model and demonstrate its effectiveness on several QA datasets. Experimental results show that although the hybrid uses no training data, its accuracy is often on-par with supervised deep learning techniques, while significantly reducing training and tuning costs across different domains.


pdf bib
A Corpus Level MIRA Tuning Strategy for Machine Translation
Ming Tan | Tian Xia | Shaojun Wang | Bowen Zhou
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing


pdf bib
A Scalable Distributed Syntactic, Semantic, and Lexical Language Model
Ming Tan | Wenli Zhou | Lei Zheng | Shaojun Wang
Computational Linguistics, Volume 38, Issue 3 - September 2012


pdf bib
A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation
Ming Tan | Wenli Zhou | Lei Zheng | Shaojun Wang
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies