Qile Zhu


pdf bib
A Batch Normalized Inference Network Keeps the KL Vanishing Away
Qile Zhu | Wei Bi | Xiaojiang Liu | Xiyao Ma | Xiaolin Li | Dapeng Wu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Variational Autoencoder (VAE) is widely used as a generative model to approximate a model’s posterior on latent variables by combining the amortized variational inference and deep neural networks. However, when paired with strong autoregressive decoders, VAE often converges to a degenerated local optimum known as “posterior collapse”. Previous approaches consider the Kullback–Leibler divergence (KL) individual for each datapoint. We propose to let the KL follow a distribution across the whole dataset, and analyze that it is sufficient to prevent posterior collapse by keeping the expectation of the KL’s distribution positive. Then we propose Batch Normalized-VAE (BN-VAE), a simple but effective approach to set a lower bound of the expectation by regularizing the distribution of the approximate posterior’s parameters. Without introducing any new model component or modifying the objective, our approach can avoid the posterior collapse effectively and efficiently. We further show that the proposed BN-VAE can be extended to conditional VAE (CVAE). Empirically, our approach surpasses strong autoregressive baselines on language modeling, text classification and dialogue generation, and rivals more complex approaches while keeping almost the same training time as VAE.

pdf bib
Don’t Parse, Insert: Multilingual Semantic Parsing with Insertion Based Decoding
Qile Zhu | Haidar Khan | Saleh Soltan | Stephen Rawls | Wael Hamza
Proceedings of the 24th Conference on Computational Natural Language Learning

Semantic parsing is one of the key components of natural language understanding systems. A successful parse transforms an input utterance to an action that is easily understood by the system. Many algorithms have been proposed to solve this problem, from conventional rule-based or statistical slot-filling systems to shift-reduce based neural parsers. For complex parsing tasks, the state-of-the-art method is based on an autoregressive sequence to sequence model that generates the parse directly. This model is slow at inference time, generating parses in O(n) decoding steps (n is the length of the target sequence). In addition, we demonstrate that this method performs poorly in zero-shot cross-lingual transfer learning settings. In this paper, we propose a non-autoregressive parser which is based on the insertion transformer to overcome these two issues. Our approach 1) speeds up decoding by 3x while outperforming the autoregressive model and 2) significantly improves cross-lingual transfer in the low-resource setting by 37% compared to autoregressive baseline. We test our approach on three wellknown monolingual datasets: ATIS, SNIPS and TOP. For cross-lingual semantic parsing, we use the MultiATIS++ and the multilingual TOP datasets.


pdf bib
GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model
Qile Zhu | Zheng Feng | Xiaolin Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Discovering the latent topics within texts has been a fundamental task for many applications. However, conventional topic models suffer different problems in different settings. The Latent Dirichlet Allocation (LDA) may not work well for short texts due to the data sparsity (i.e. the sparse word co-occurrence patterns in short documents). The Biterm Topic Model (BTM) learns topics by modeling the word-pairs named biterms in the whole corpus. This assumption is very strong when documents are long with rich topic information and do not exhibit the transitivity of biterms. In this paper, we propose a novel way called GraphBTM to represent biterms as graphs and design a Graph Convolutional Networks (GCNs) with residual connections to extract transitive features from biterms. To overcome the data sparsity of LDA and the strong assumption of BTM, we sample a fixed number of documents to form a mini-corpus as a sample. We also propose a dataset called All News extracted from 15 news publishers, in which documents are much longer than 20 Newsgroups. We present an amortized variational inference method for GraphBTM. Our method generates more coherent topics compared with previous approaches. Experiments show that the sampling strategy improves performance by a large margin.


pdf bib
Character Sequence-to-Sequence Model with Global Attention for Universal Morphological Reinflection
Qile Zhu | Yanjun Li | Xiaolin Li
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection