Kenny Zhu

Also published as: Kenny Q. Zhu


2022

pdf bib
Length Control in Abstractive Summarization by Pretraining Information Selection
Yizhu Liu | Qi Jia | Kenny Zhu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Previous length-controllable summarization models mostly control lengths at the decoding stage, whereas the encoding or the selection of information from the source document is not sensitive to the designed length. They also tend to generate summaries as long as those in the training data. In this paper, we propose a length-aware attention mechanism (LAAM) to adapt the encoding of the source based on the desired length. Our approach works by training LAAM on a summary length balanced dataset built from the original training data, and then fine-tuning as usual. Results show that this approach is effective in generating high-quality summaries with desired lengths and even those short lengths never seen in the original training set.

pdf bib
ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments
Ruolan Yang | Zitong Li | Haifeng Tang | Kenny Zhu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Existing automatic evaluation systems of chatbots mostly rely on static chat scripts as ground truth, which is hard to obtain, and requires access to the models of the bots as a form of “white-box testing”. Interactive evaluation mitigates this problem but requires human involvement. In our work, we propose an interactive chatbot evaluation framework in which chatbots compete with each other like in a sports tournament, using flexible scoring metrics. This framework can efficiently rank chatbots independently from their model architectures and the domains for which they are trained.

pdf bib
Reference-free Summarization Evaluation via Semantic Correlation and Compression Ratio
Yizhu Liu | Qi Jia | Kenny Zhu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

A document can be summarized in a number of ways. Reference-based evaluation of summarization has been criticized for its inflexibility. The more sufficient the number of abstracts, the more accurate the evaluation results. However, it is difficult to collect sufficient reference summaries. In this paper, we propose a new automatic reference-free evaluation metric that compares semantic distribution between source document and summary by pretrained language models and considers summary compression ratio. The experiments show that this metric is more consistent with human evaluation in terms of coherence, consistency, relevance and fluency.

pdf bib
Leaner and Faster: Two-Stage Model Compression for Lightweight Text-Image Retrieval
Siyu Ren | Kenny Zhu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Current text-image approaches (e.g., CLIP) typically adopt dual-encoder architecture using pre-trained vision-language representation. However, these models still pose non-trivial memory requirements and substantial incremental indexing time, which makes them less practical on mobile devices. In this paper, we present an effective two-stage framework to compress large pre-trained dual-encoder for lightweight text-image retrieval. The resulting model is smaller (39% of the original), faster (1.6x/2.9x for processing image/text respectively), yet performs on par with or better than the original full model on Flickr30K and MSCOCO benchmarks. We also open-source an accompanying realistic mobile image search application.

pdf bib
Post-Training Dialogue Summarization using Pseudo-Paraphrasing
Qi Jia | Yizhu Liu | Haifeng Tang | Kenny Zhu
Findings of the Association for Computational Linguistics: NAACL 2022

Previous dialogue summarization techniques adapt large language models pretrained on the narrative text by injecting dialogue-specific features into the models. These features either require additional knowledge to recognize or make the resulting models harder to tune. To bridge the format gap between dialogues and narrative summaries in dialogue summarization tasks, we propose to post-train pretrained language models (PLMs) to rephrase from dialogue to narratives. After that, the model is fine-tuned for dialogue summarization as usual. Comprehensive experiments show that our approach significantly improves vanilla PLMs on dialogue summarization and outperforms other SOTA models by the summary quality and implementation costs.

pdf bib
Specializing Pre-trained Language Models for Better Relational Reasoning via Network Pruning
Siyu Ren | Kenny Zhu
Findings of the Association for Computational Linguistics: NAACL 2022

Pretrained masked language models (PLMs) were shown to be inheriting a considerable amount of relational knowledge from the source corpora. In this paper, we present an in-depth and comprehensive study concerning specializing PLMs into relational models from the perspective of network pruning. We show that it is possible to find subnetworks capable of representing grounded commonsense relations at non-trivial sparsity while being more generalizable than original PLMs in scenarios requiring knowledge of single or multiple commonsense relations.

2020

pdf bib
Multi-turn Response Selection using Dialogue Dependency Relations
Qi Jia | Yizhu Liu | Siyu Ren | Kenny Zhu | Haifeng Tang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Multi-turn response selection is a task designed for developing dialogue agents. The performance on this task has a remarkable improvement with pre-trained language models. However, these models simply concatenate the turns in dialogue history as the input and largely ignore the dependencies between the turns. In this paper, we propose a dialogue extraction algorithm to transform a dialogue history into threads based on their dependency relations. Each thread can be regarded as a self-contained sub-dialogue. We also propose Thread-Encoder model to encode threads and candidates into compact representations by pre-trained Transformers and finally get the matching score through an attention layer. The experiments show that dependency relations are helpful for dialogue context understanding, and our model outperforms the state-of-the-art baselines on both DSTC7 and DSTC8*, with competitive results on UbuntuV2.

2019

pdf bib
Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations
Zuyi Bao | Rui Huang | Chen Li | Kenny Zhu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Previous work on cross-lingual sequence labeling tasks either requires parallel data or bridges the two languages through word-by-word matching. Such requirements and assumptions are infeasible for most languages, especially for languages with large linguistic distances, e.g., English and Chinese. In this work, we propose a Multilingual Language Model with deep semantic Alignment (MLMA) to generate language-independent representations for cross-lingual sequence labeling. Our methods require only monolingual corpora with no bilingual resources at all and take advantage of deep contextualized representations. Experimental results show that our approach achieves new state-of-the-art NER and POS performance across European languages, and is also effective on distant language pairs such as English and Chinese.

2018

pdf bib
Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text
Junjie Xing | Kenny Zhu | Shaodian Zhang
Proceedings of the 27th International Conference on Computational Linguistics

Chinese word segmentation (CWS) trained from open source corpus faces dramatic performance drop when dealing with domain text, especially for a domain with lots of special terms and diverse writing styles, such as the biomedical domain. However, building domain-specific CWS requires extremely high annotation cost. In this paper, we propose an approach by exploiting domain-invariant knowledge from high resource to low resource domains. Extensive experiments show that our model achieves consistently higher accuracy than the single-task CWS and other transfer learning baselines, especially when there is a large disparity between source and target domains.

pdf bib
Knowledge Base Question Answering via Encoding of Complex Query Graphs
Kangqi Luo | Fengli Lin | Xusheng Luo | Kenny Zhu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Answering complex questions that involve multiple entities and multiple relations using a standard knowledge base is an open and challenging task. Most existing KBQA approaches focus on simpler questions and do not work very well on complex questions because they were not able to simultaneously represent the question and the corresponding complex query structure. In this work, we encode such complex query structure into a uniform vector representation, and thus successfully capture the interactions between individual semantic components within a complex question. This approach consistently outperforms existing methods on complex questions while staying competitive on simple questions.

pdf bib
ExtRA: Extracting Prominent Review Aspects from Customer Feedback
Zhiyi Luo | Shanshan Huang | Frank F. Xu | Bill Yuchen Lin | Hanyuan Shi | Kenny Zhu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Many existing systems for analyzing and summarizing customer reviews about products or service are based on a number of prominent review aspects. Conventionally, the prominent review aspects of a product type are determined manually. This costly approach cannot scale to large and cross-domain services such as Amazon.com, Taobao.com or Yelp.com where there are a large number of product types and new products emerge almost every day. In this paper, we propose a novel framework, for extracting the most prominent aspects of a given product type from textual reviews. The proposed framework, ExtRA, extracts K most prominent aspect terms or phrases which do not overlap semantically automatically without supervision. Extensive experiments show that ExtRA is effective and achieves the state-of-the-art performance on a dataset consisting of different product types.

pdf bib
Controlling Length in Abstractive Summarization Using a Convolutional Neural Network
Yizhu Liu | Zhiyi Luo | Kenny Zhu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Convolutional neural networks (CNNs) have met great success in abstractive summarization, but they cannot effectively generate summaries of desired lengths. Because generated summaries are used in difference scenarios which may have space or length constraints, the ability to control the summary length in abstractive summarization is an important problem. In this paper, we propose an approach to constrain the summary length by extending a convolutional sequence to sequence model. The results show that this approach generates high-quality summaries with user defined length, and outperforms the baselines consistently in terms of ROUGE score, length variations and semantic similarity.

pdf bib
Mining Cross-Cultural Differences and Similarities in Social Media
Bill Yuchen Lin | Frank F. Xu | Kenny Zhu | Seung-won Hwang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cross-cultural differences and similarities are common in cross-lingual natural language understanding, especially for research in social media. For instance, people of distinct cultures often hold different opinions on a single named entity. Also, understanding slang terms across languages requires knowledge of cross-cultural similarities. In this paper, we study the problem of computing such cross-cultural differences and similarities. We present a lightweight yet effective approach, and evaluate it on two novel tasks: 1) mining cross-cultural differences of named entities and 2) finding similar terms for slang across languages. Experimental results show that our framework substantially outperforms a number of baseline methods on both tasks. The framework could be useful for machine translation applications and research in computational social science.

pdf bib
Automatic Extraction of Commonsense LocatedNear Knowledge
Frank F. Xu | Bill Yuchen Lin | Kenny Zhu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

LocatedNear relation is a kind of commonsense knowledge describing two physical objects that are typically found near each other in real life. In this paper, we study how to automatically extract such relationship through a sentence-level relation classifier and aggregating the scores of entity pairs from a large corpus. Also, we release two benchmark datasets for evaluation and future research.

2017

pdf bib
Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media
Bill Y. Lin | Frank Xu | Zhiyi Luo | Kenny Zhu
Proceedings of the 3rd Workshop on Noisy User-generated Text

In this paper, we present our multi-channel neural architecture for recognizing emerging named entity in social media messages, which we applied in the Novel and Emerging Named Entity Recognition shared task at the EMNLP 2017 Workshop on Noisy User-generated Text (W-NUT). We propose a novel approach, which incorporates comprehensive word representations with multi-channel information and Conditional Random Fields (CRF) into a traditional Bidirectional Long Short-Term Memory (BiLSTM) neural network without using any additional hand-craft features such as gazetteers. In comparison with other systems participating in the shared task, our system won the 2nd place.

2016

pdf bib
Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
Shi Feng | Shujie Liu | Nan Yang | Mu Li | Ming Zhou | Kenny Q. Zhu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In neural machine translation, the attention mechanism facilitates the translation process by producing a soft alignment between the source sentence and the target sentence. However, without dedicated distortion and fertility models seen in traditional SMT systems, the learned alignment may not be accurate, which can lead to low translation quality. In this paper, we propose two novel models to improve attention-based neural machine translation. We propose a recurrent attention mechanism as an implicit distortion model, and a fertility conditioned decoder as an implicit fertility model. We conduct experiments on large-scale Chinese–English translation tasks. The results show that our models significantly improve both the alignment and translation quality compared to the original attention mechanism and several other variations.

pdf bib
Towards Non-projective High-Order Dependency Parser
Wenjing Fang | Kenny Zhu | Yizhong Wang | Jia Tan
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

This paper presents a novel high-order dependency parsing framework that targets non-projective treebanks. It imitates how a human parses sentences in an intuitive way. At every step of the parse, it determines which word is the easiest to process among all the remaining words, identifies its head word and then folds it under the head word. Further, this work is flexible enough to be augmented with other parsing techniques.

2015

pdf bib
Inferring Binary Relation Schemas for Open Information Extraction
Kangqi Luo | Xusheng Luo | Kenny Zhu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2013

pdf bib
Data-Driven Metaphor Recognition and Explanation
Hongsong Li | Kenny Q. Zhu | Haixun Wang
Transactions of the Association for Computational Linguistics, Volume 1

Recognizing metaphors and identifying the source-target mappings is an important task as metaphorical text poses a big challenge for machine reading. To address this problem, we automatically acquire a metaphor knowledge base and an isA knowledge base from billions of web pages. Using the knowledge bases, we develop an inference mechanism to recognize and explain the metaphors in the text. To our knowledge, this is the first purely data-driven approach of probabilistic metaphor acquisition, recognition, and explanation. Our results shows that it significantly outperforms other state-of-the-art methods in recognizing and explaining metaphors.