Kenny Zhu - ACL Anthology

Kenny Zhu

Also published as: Kenny Q. Zhu

Other people with similar names: Kenny Q. Zhu

2025

A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
Xiujie Song | Mengyue Wu | Kenny Q. Zhu | Chunhao Zhang | Yanyi Chen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large Vision-Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities. Inspired by the prevalent use of the Cookie Theft task in human cognitive tests, we propose a novel evaluation benchmark to evaluate high-level cognitive abilities of LVLMs using images with rich semantics. The benchmark consists of 251 images along with comprehensive annotations. It defines eight reasoning capabilities and comprises an image description task and a visual question answering task. Our evaluation of well-known LVLMs shows that there is still a significant gap in cognitive abilities between LVLMs and humans.

MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics
Haoan Jin | Jiacheng Shi | Hanhui Xu | Kenny Q. Zhu | Mengyue Wu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

Large language models (LLMs) demonstrate significant potential in advancing medical applications, yet their capabilities in addressing medical ethics challenges remain underexplored. This paper introduces MedEthicEval, a novel benchmark designed to systematically evaluate LLMs in the domain of medical ethics. Our framework encompasses two key components: knowledge, assessing the models’ grasp of medical ethics principles, and application, focusing on their ability to apply these principles across diverse scenarios. To support this benchmark, we consulted with medical ethics researchers and developed three datasets addressing distinct ethical challenges: blatant violations of medical ethics, priority dilemmas with clear inclinations, and equilibrium dilemmas without obvious resolutions. MedEthicEval serves as a critical tool for understanding LLMs’ ethical reasoning in healthcare, paving the way for their responsible and effective use in medical contexts.

A Diverse and Effective Retrieval-Based Debt Collection System with Expert Knowledge
Jiaming Luo | Weiyi Luo | Guoqing Sun | Mengchen Zhu | Haifeng Tang | Kenny Q. Zhu | Mengyue Wu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

Designing effective debt collection systems is crucial for improving operational efficiency and reducing costs in the financial industry. However, the challenges of maintaining script diversity, contextual relevance, and coherence make this task particularly difficult. This paper presents a debt collection system based on real debtor-collector data from a major commercial bank. We construct a script library from real-world debt collection conversations, and propose a two-stage retrieval based response system for contextual relevance. Experimental results show that our system improves script diversity, enhances response relevance, and achieves practical deployment efficiency through knowledge distillation. This work offers a scalable and automated solution, providing valuable insights for advancing debt collection practices in real-world applications.

Toward Automatic Discovery of a Canine Phonetic Alphabet
Theron S. Wang | Xingyuan Li | Hridayesh Lekhak | Tuan Minh Dang | Mengyue Wu | Kenny Q. Zhu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dogs communicate intelligently but little is known about the phonetic properties of their vocalization communication. For the first time, this paper presents an iterative algorithm inspired by human phonetic discovery, which is based on minimal pairs that determine phonemes by distinguishing different words in human language, and is able to produce a complete alphabet of distinct canine phoneme-like units. In addition, the algorithm produces a number of canine repeated acoustic units, which may correspond to specific environments and activities of a dog, composed exclusively of the canine phoneme-like units in the alphabet. The framework outlined in this paper is expected to function not only on canines but other animal species.

Tracking Life’s Ups and Downs: Mining Life Events from Social Media Posts for Mental Health Analysis
Minghao Lv | Siyuan Chen | Haoan Jin | Minghao Yuan | Qianqian Ju | Yujia Peng | Kenny Q. Zhu | Mengyue Wu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Social media platforms possess considerable potential in the realm of exploring mental health. Previous research has indicated that major life events can greatly impact individuals’ mental health. However, due to the complexity and ambiguity nature of life events, shedding its light on social media data is quite challenging. In this paper, we are dedicated to uncovering life events mentioned in posts on social media. We hereby provide a carefully-annotated social media event dataset, PsyEvent, which encompasses 12 major life event categories that are likely to occur in everyday life. This dataset is human-annotated under iterative procedure and boasts a high level of quality. Furthermore, by applying the life events extracted from posts to downstream tasks such as early risk detection of depression and suicide risk prediction, we have observed a considerable improvement in performance. This suggests that extracting life events from social media can be beneficial for the analysis of individuals’ mental health.

2024

Mapping Long-term Causalities in Psychiatric Symptomatology and Life Events from Social Media
Siyuan Chen | Meilin Wang | Minghao Lv | Zhiling Zhang | Qianqian Ju | Dejiyangla | Yujia Peng | Kenny Q. Zhu | Mengyue Wu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Social media is a valuable data source for exploring mental health issues. However, previous studies have predominantly focused on the semantic content of these posts, overlooking the importance of their temporal attributes, as well as the evolving nature of mental disorders and symptoms.In this paper, we study the causality between psychiatric symptoms and life events, as well as among different symptoms from social media posts, which leads to better understanding of the underlying mechanisms of mental disorders. By applying these extracted causality features to tasks such as diagnosis point detection and early risk detection of depression, we notice considerable performance enhancement. This indicates that causality information extracted from social media data can boost the efficacy of mental disorder diagnosis and treatment planning.

Low-Rank Prune-And-Factorize for Language Model Compression
Siyu Ren | Kenny Q. Zhu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The components underpinning PLMs—large weight matrices—were shown to bear considerable redundancy. Matrix factorization, a well-established technique from matrix theory, has been utilized to reduce the number of parameters in PLM. However, it fails to retain satisfactory performance under moderate to high compression rates. In this paper, we identify the full-rankness of fine-tuned PLM as the fundamental bottleneck for the failure of matrix factorization and explore the use of network pruning to extract low-rank sparsity pattern desirable to matrix factorization. We find such a low-rank sparsity pattern exclusively exists in models generated by first-order pruning, which motivates us to unite the two approaches and achieve more effective model compression. We further propose two techniques: sparsity-aware SVD and mixed-rank fine-tuning, which improve the initialization and training of the compression procedure, respectively. Experiments on GLUE and question-answering tasks show that the proposed method has a superior compression-performance trade-off compared to existing approaches.

Phonetic and Lexical Discovery of Canine Vocalization
Theron S. Wang | Xingyuan Li | Chunhao Zhang | Mengyue Wu | Kenny Q. Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024

This paper attempts to discover communication patterns automatically within dog vocalizations in a data-driven approach, which breaks the barrier previous approaches that rely on human prior knowledge on limited data. We present a self-supervised approach with HuBERT, enabling the accurate classification of phones, and an adaptive grammar induction method that identifies phone sequence patterns that suggest a preliminary vocabulary within dog vocalizations. Our results show that a subset of this vocabulary has substantial causality relations with certain canine activities, suggesting signs of stable semantics associated with these “words”.

Automatic Reconstruction of Ancient Chinese Pronunciations
Zhige Huang | Haoan Jin | Mengyue Wu | Kenny Q. Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024

Reconstructing ancient Chinese pronunciation is a challenging task due to the scarcity of phonetic records. Different from historical linguistics’ comparative approaches, we reformulate this problem into a temporal prediction task with masked language models, digitizing existing phonology rules into ACP (Ancient Chinese Phonology) dataset of 70,943 entries for 17,001 Chinese characters. Utilizing this dataset and Chinese character glyph information, our transformer-based model demonstrates superior performance on a series of reconstruction tasks, with or without prior phonological knowledge on the target historical period. Our work significantly advances the digitization and computational reconstruction of ancient Chinese phonology, providing a more complete and temporally contextualized resource for computational linguistics and historical research. The dataset and model training code are publicly available.

2023

Statistically Profiling Biases in Natural Language Reasoning Datasets and Models
Shanshan Huang | Kenny Zhu
Findings of the Association for Computational Linguistics: EMNLP 2023

Recent studies have shown that many natural language understanding and reasoning datasets contain statistical cues that can be exploited by NLP models, resulting in an overestimation of their capabilities. Existing methods, such as “hypothesis-only” tests and CheckList, are limited in identifying these cues and evaluating model weaknesses. We introduce ICQ (I-See-Cue), a lightweight, general statistical profiling framework that automatically identifies potential biases in multiple-choice NLU datasets without requiring additional test cases. ICQ assesses the extent to which models exploit these biases through black-box testing, addressing the limitations of current methods. In this work, we conduct a comprehensive evaluation of statistical biases in 10 popular NLU datasets and 4 models, confirming prior findings, revealing new insights, and offering an online demonstration system to encourage users to assess their own datasets and models. Furthermore, we present a case study on investigating ChatGPT’s bias, providing valuable recommendations for practical applications.

Transcribing Vocal Communications of Domestic Shiba lnu Dogs
Jieyi Huang | Chunhao Zhang | Mengyue Wu | Kenny Zhu
Findings of the Association for Computational Linguistics: ACL 2023

How animals communicate and whether they have languages is a persistent curiosity of human beings. However, the study of animal communications has been largely restricted to data from field recordings or in a controlled environment, which is expensive and limited in scale and variety. In this paper, we take domestic Shiba Inu dogs as an example, and extract their vocal communications from large amount of YouTube videos of Shiba Inu dogs. We classify these clips into different scenarios and locations, and further transcribe the audio into phonetically symbolic scripts through a systematic process. We discover consistent phonetic symbols among their expressions, which indicates that Shiba Inu dogs can have systematic verbal communication patterns. This reusable framework produces the first-of-its-kind Shiba Inu vocal communication dataset that will be valuable to future research in both zoology and linguistics.

Pruning Pre-trained Language Models with Principled Importance and Self-regularization
Siyu Ren | Kenny Zhu
Findings of the Association for Computational Linguistics: ACL 2023

Iterative pruning is one of the most effective compression methods for pre-trained language models. We discovered that finding the optimal pruning decision is an equality-constrained 0-1 Integer Linear Programming problem. The solution to this optimization problem leads to a principled importance criterion which we use to rank parameters during iterative model pruning. To mitigate the poor generalization at high sparsity levels, we propose a self-regularization scheme where model prediction is regularized by the latest checkpoint with increasing sparsity throughout pruning. Our experiments on natural language understanding, question answering, named entity recognition, and data-to-text generation with various Transformer-based PLMs show the effectiveness of the approach at various sparsity levels.

Incomplete Utterance Rewriting by A Two-Phase Locate-and-Fill Regime
Zitong Li | Jiawei Li | Haifeng Tang | Kenny Zhu | Ruolan Yang
Findings of the Association for Computational Linguistics: ACL 2023

Rewriting incomplete and ambiguous utterances can improve dialogue models’ understanding of the context and help them generate better results. However, the existing end-to-end models will have the problem of too large search space, resulting in poor quality of rewriting results. We propose a 2-phase rewriting framework which first predicts the empty slots in the utterance that need to be completed, and then generate the part to be filled into each positions. Our framework is simple to implement, fast to run, and achieves the state-of-the-art results on several public rewriting datasets.

Reducing Sensitivity on Speaker Names for Text Generation from Dialogues
Qi Jia | Haifeng Tang | Kenny Zhu
Findings of the Association for Computational Linguistics: ACL 2023

Changing speaker names consistently throughout a dialogue should not affect its meaning and corresponding outputs for text generation from dialogues. However, pre-trained language models, serving as the backbone for dialogue-processing tasks, have shown to be sensitive to nuances. This may result in unfairness in real-world applications. No comprehensive analysis of this problem has been done in the past. In this work, we propose to quantitatively measure a model’s sensitivity on speaker names, and comprehensively evaluate a number of known methods for reducing speaker name sensitivity, including a novel approach of our own. Extensive experiments on multiple datasets provide a benchmark for this problem and show the favorable performance of our approach in sensitivity reduction and quality of generation.

Semantic Space Grounded Weighted Decoding for Multi-Attribute Controllable Dialogue Generation
Zhiling Zhang | Mengyue Wu | Kenny Zhu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Controlling chatbot utterance generation with multiple attributes such as personalities, emotions and dialogue acts is a practically useful but under-studied problem. We propose a novel framework called DASC that possesses strong controllability with a weighted decoding paradigm, while improving generation quality with the grounding in an attribute semantics space. Generation with multiple attributes is then intuitively implemented with an interpolation of multiple attribute embeddings, which results in substantial reduction in the model sizes. Experiments show that DASC can achieve high control accuracy in generation task with the simultaneous control of 3 aspects while also producing interesting and reasonably sensible responses, even in an out-of-distribution robustness test.

Context Compression for Auto-regressive Transformers with Sentinel Tokens
Siyu Ren | Qi Jia | Kenny Zhu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

The quadratic complexity of the attention module makes it gradually become the bulk of compute in Transformer-based LLMs during generation. Moreover, the excessive key-value cache that arises when dealing with long inputs also brings severe issues on memory footprint and inference latency. In this work, we propose a plug-and-play approach that is able to incrementally compress the intermediate activation of a specified span of tokens into compact ones, thereby reducing both memory and computational cost when processing subsequent context. Experiments on both in-domain language modeling and zero-shot open-ended document generation demonstrate the advantage of our approach over sparse attention baselines in terms of fluency, n-gram matching, and semantic similarity. At last, we comprehensively profile the benefit of context compression on improving the system throughout. Code is available at https://github.com/DRSY/KV_Compression.

Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model
Qi Jia | Siyu Ren | Yizhu Liu | Kenny Zhu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Despite tremendous improvements in natural language generation, summarization models still suffer from the unfaithfulness issue. Previous work evaluates faithfulness either using models trained on the other tasks or in-domain synthetic data, or prompting a large model such as ChatGPT. This paper proposes to do zero-shot faithfulness evaluation simply with a moderately-sized foundation language model. We introduce a new metric FFLM, which is a combination of probability changes based on the intuition that prefixing a piece of text that is consistent with the output will increase the probability of predicting the output. Experiments show that FFLM performs competitively with or even outperforms ChatGPT on both inconsistency detection and faithfulness rating with 24x fewer parameters. FFLM also achieves improvements over other strong baselines.

Detection of Multiple Mental Disorders from Social Media with Two-Stream Psychiatric Experts
Siyuan Chen | Zhiling Zhang | Mengyue Wu | Kenny Zhu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Existing Mental Disease Detection (MDD) research largely studies the detection of a single disorder, overlooking the fact that mental diseases might occur in tandem. Many approaches are not backed by domain knowledge (e.g., psychiatric symptoms) and thus fail to produce interpretable results. To tackle these issues, we propose an MDD framework that is capable of learning the shared clues of all diseases, while also capturing the specificity of each single disease. The two-stream architecture which simultaneously processes text and symptom features can combine the strength of both modalities and offer knowledge-based explainability. Experiments on the detection of 7 diseases show that our model can boost detection performance by more than 10%, especially in relatively rare classes.

In-sample Curriculum Learning by Sequence Completion for Natural Language Generation
Qi Jia | Yizhu Liu | Haifeng Tang | Kenny Zhu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Curriculum learning has shown promising improvements in multiple domains by training machine learning models from easy samples to hard ones. Previous works which either design rules or train models for scoring the difficulty highly rely on task-specific expertise, and cannot generalize. Inspired by the “easy-to-hard” intuition, we propose to do in-sample curriculum learning for natural language generation tasks. Our learning strategy starts training the model to generate the last few words, i.e., do sequence completion, and gradually extends to generate the whole output sequence. Comprehensive experiments show that it generalizes well to different tasks and achieves significant improvements over strong baselines.

Transferable and Efficient: Unifying Dynamic Multi-Domain Product Categorization
Shansan Gong | Zelin Zhou | Shuo Wang | Fengjiao Chen | Xiujie Song | Xuezhi Cao | Yunsen Xian | Kenny Zhu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

As e-commerce platforms develop different business lines, a special but challenging product categorization scenario emerges, where there are multiple domain-specific category taxonomies and each of them evolves dynamically over time. In order to unify the categorization process and ensure efficiency, we propose a two-stage taxonomy-agnostic framework that relies solely on calculating the semantic relatedness between product titles and category names in the vector space. To further enhance domain transferability and better exploit cross-domain data, we design two plug-in modules: a heuristic mapping scorer and a pretrained contrastive ranking module with the help of meta concepts, which represent keyword knowledge shared across domains. Comprehensive offline experiments show that our method outperforms strong baselineson three dynamic multi-domain product categorization (DMPC) tasks,and online experiments reconfirm its efficacy with a5% increase on seasonal purchase revenue. Related datasets will be released.

2022

Leaner and Faster: Two-Stage Model Compression for Lightweight Text-Image Retrieval
Siyu Ren | Kenny Zhu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Current text-image approaches (e.g., CLIP) typically adopt dual-encoder architecture using pre-trained vision-language representation. However, these models still pose non-trivial memory requirements and substantial incremental indexing time, which makes them less practical on mobile devices. In this paper, we present an effective two-stage framework to compress large pre-trained dual-encoder for lightweight text-image retrieval. The resulting model is smaller (39% of the original), faster (1.6x/2.9x for processing image/text respectively), yet performs on par with or better than the original full model on Flickr30K and MSCOCO benchmarks. We also open-source an accompanying realistic mobile image search application.

Reference-free Summarization Evaluation via Semantic Correlation and Compression Ratio
Yizhu Liu | Qi Jia | Kenny Zhu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

A document can be summarized in a number of ways. Reference-based evaluation of summarization has been criticized for its inflexibility. The more sufficient the number of abstracts, the more accurate the evaluation results. However, it is difficult to collect sufficient reference summaries. In this paper, we propose a new automatic reference-free evaluation metric that compares semantic distribution between source document and summary by pretrained language models and considers summary compression ratio. The experiments show that this metric is more consistent with human evaluation in terms of coherence, consistency, relevance and fluency.

Specializing Pre-trained Language Models for Better Relational Reasoning via Network Pruning
Siyu Ren | Kenny Zhu
Findings of the Association for Computational Linguistics: NAACL 2022

Pretrained masked language models (PLMs) were shown to be inheriting a considerable amount of relational knowledge from the source corpora. In this paper, we present an in-depth and comprehensive study concerning specializing PLMs into relational models from the perspective of network pruning. We show that it is possible to find subnetworks capable of representing grounded commonsense relations at non-trivial sparsity while being more generalizable than original PLMs in scenarios requiring knowledge of single or multiple commonsense relations.

Post-Training Dialogue Summarization using Pseudo-Paraphrasing
Qi Jia | Yizhu Liu | Haifeng Tang | Kenny Zhu
Findings of the Association for Computational Linguistics: NAACL 2022

Previous dialogue summarization techniques adapt large language models pretrained on the narrative text by injecting dialogue-specific features into the models. These features either require additional knowledge to recognize or make the resulting models harder to tune. To bridge the format gap between dialogues and narrative summaries in dialogue summarization tasks, we propose to post-train pretrained language models (PLMs) to rephrase from dialogue to narratives. After that, the model is fine-tuned for dialogue summarization as usual. Comprehensive experiments show that our approach significantly improves vanilla PLMs on dialogue summarization and outperforms other SOTA models by the summary quality and implementation costs.

Symptom Identification for Interpretable Detection of Multiple Mental Disorders on Social Media
Zhiling Zhang | Siyuan Chen | Mengyue Wu | Kenny Zhu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Mental disease detection (MDD) from social media has suffered from poor generalizability and interpretability, due to lack of symptom modeling. This paper introduces PsySym, the first annotated symptom identification corpus of multiple psychiatric disorders, to facilitate further research progress. PsySym is annotated according to a knowledge graph of the 38 symptom classes related to 7 mental diseases complied from established clinical manuals and scales, and a novel annotation framework for diversity and quality. Experiments show that symptom-assisted MDD enabled by PsySym can outperform strong pure-text baselines. We also exhibit the convincing MDD explanations provided by symptom predictions with case studies, and point to their further potential applications.

Opinion Summarization by Weak-Supervision from Mix-structured Data
Yizhu Liu | Qi Jia | Kenny Zhu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Opinion summarization of multiple reviews suffers from the lack of reference summaries for training.Most previous approaches construct multiple reviews and their summary based on textual similarities between reviews,resulting in information mismatch between the review input and the summary. In this paper, we convert each review into a mixof structured and unstructured data, which we call opinion-aspect pairs (OAs) and implicit sentences (ISs).We propose a new method to synthesize training pairs of such mix-structured data as input and the textual summary as output,and design a summarization model with OA encoder and IS encoder.Experiments show that our approach outperforms previous methods on Yelp, Amazon and RottenTomatos datasets.

ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments
Ruolan Yang | Zitong Li | Haifeng Tang | Kenny Zhu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Existing automatic evaluation systems of chatbots mostly rely on static chat scripts as ground truth, which is hard to obtain, and requires access to the models of the bots as a form of “white-box testing”. Interactive evaluation mitigates this problem but requires human involvement. In our work, we propose an interactive chatbot evaluation framework in which chatbots compete with each other like in a sports tournament, using flexible scoring metrics. This framework can efficiently rank chatbots independently from their model architectures and the domains for which they are trained.

Length Control in Abstractive Summarization by Pretraining Information Selection
Yizhu Liu | Qi Jia | Kenny Zhu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Previous length-controllable summarization models mostly control lengths at the decoding stage, whereas the encoding or the selection of information from the source document is not sensitive to the designed length. They also tend to generate summaries as long as those in the training data. In this paper, we propose a length-aware attention mechanism (LAAM) to adapt the encoding of the source based on the desired length. Our approach works by training LAAM on a summary length balanced dataset built from the original training data, and then fine-tuning as usual. Results show that this approach is effective in generating high-quality summaries with desired lengths and even those short lengths never seen in the original training set.

2020

Multi-turn Response Selection using Dialogue Dependency Relations
Qi Jia | Yizhu Liu | Siyu Ren | Kenny Zhu | Haifeng Tang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Multi-turn response selection is a task designed for developing dialogue agents. The performance on this task has a remarkable improvement with pre-trained language models. However, these models simply concatenate the turns in dialogue history as the input and largely ignore the dependencies between the turns. In this paper, we propose a dialogue extraction algorithm to transform a dialogue history into threads based on their dependency relations. Each thread can be regarded as a self-contained sub-dialogue. We also propose Thread-Encoder model to encode threads and candidates into compact representations by pre-trained Transformers and finally get the matching score through an attention layer. The experiments show that dependency relations are helpful for dialogue context understanding, and our model outperforms the state-of-the-art baselines on both DSTC7 and DSTC8*, with competitive results on UbuntuV2.

2019

Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations
Zuyi Bao | Rui Huang | Chen Li | Kenny Zhu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Previous work on cross-lingual sequence labeling tasks either requires parallel data or bridges the two languages through word-by-word matching. Such requirements and assumptions are infeasible for most languages, especially for languages with large linguistic distances, e.g., English and Chinese. In this work, we propose a Multilingual Language Model with deep semantic Alignment (MLMA) to generate language-independent representations for cross-lingual sequence labeling. Our methods require only monolingual corpora with no bilingual resources at all and take advantage of deep contextualized representations. Experimental results show that our approach achieves new state-of-the-art NER and POS performance across European languages, and is also effective on distant language pairs such as English and Chinese.

2018

Automatic Extraction of Commonsense LocatedNear Knowledge
Frank F. Xu | Bill Yuchen Lin | Kenny Zhu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

LocatedNear relation is a kind of commonsense knowledge describing two physical objects that are typically found near each other in real life. In this paper, we study how to automatically extract such relationship through a sentence-level relation classifier and aggregating the scores of entity pairs from a large corpus. Also, we release two benchmark datasets for evaluation and future research.

Mining Cross-Cultural Differences and Similarities in Social Media
Bill Yuchen Lin | Frank F. Xu | Kenny Zhu | Seung-won Hwang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cross-cultural differences and similarities are common in cross-lingual natural language understanding, especially for research in social media. For instance, people of distinct cultures often hold different opinions on a single named entity. Also, understanding slang terms across languages requires knowledge of cross-cultural similarities. In this paper, we study the problem of computing such cross-cultural differences and similarities. We present a lightweight yet effective approach, and evaluate it on two novel tasks: 1) mining cross-cultural differences of named entities and 2) finding similar terms for slang across languages. Experimental results show that our framework substantially outperforms a number of baseline methods on both tasks. The framework could be useful for machine translation applications and research in computational social science.

Controlling Length in Abstractive Summarization Using a Convolutional Neural Network
Yizhu Liu | Zhiyi Luo | Kenny Zhu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Convolutional neural networks (CNNs) have met great success in abstractive summarization, but they cannot effectively generate summaries of desired lengths. Because generated summaries are used in difference scenarios which may have space or length constraints, the ability to control the summary length in abstractive summarization is an important problem. In this paper, we propose an approach to constrain the summary length by extending a convolutional sequence to sequence model. The results show that this approach generates high-quality summaries with user defined length, and outperforms the baselines consistently in terms of ROUGE score, length variations and semantic similarity.

ExtRA: Extracting Prominent Review Aspects from Customer Feedback
Zhiyi Luo | Shanshan Huang | Frank F. Xu | Bill Yuchen Lin | Hanyuan Shi | Kenny Zhu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Many existing systems for analyzing and summarizing customer reviews about products or service are based on a number of prominent review aspects. Conventionally, the prominent review aspects of a product type are determined manually. This costly approach cannot scale to large and cross-domain services such as Amazon.com, Taobao.com or Yelp.com where there are a large number of product types and new products emerge almost every day. In this paper, we propose a novel framework, for extracting the most prominent aspects of a given product type from textual reviews. The proposed framework, ExtRA, extracts K most prominent aspect terms or phrases which do not overlap semantically automatically without supervision. Extensive experiments show that ExtRA is effective and achieves the state-of-the-art performance on a dataset consisting of different product types.

Knowledge Base Question Answering via Encoding of Complex Query Graphs
Kangqi Luo | Fengli Lin | Xusheng Luo | Kenny Zhu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Answering complex questions that involve multiple entities and multiple relations using a standard knowledge base is an open and challenging task. Most existing KBQA approaches focus on simpler questions and do not work very well on complex questions because they were not able to simultaneously represent the question and the corresponding complex query structure. In this work, we encode such complex query structure into a uniform vector representation, and thus successfully capture the interactions between individual semantic components within a complex question. This approach consistently outperforms existing methods on complex questions while staying competitive on simple questions.

Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text
Junjie Xing | Kenny Zhu | Shaodian Zhang
Proceedings of the 27th International Conference on Computational Linguistics

Chinese word segmentation (CWS) trained from open source corpus faces dramatic performance drop when dealing with domain text, especially for a domain with lots of special terms and diverse writing styles, such as the biomedical domain. However, building domain-specific CWS requires extremely high annotation cost. In this paper, we propose an approach by exploiting domain-invariant knowledge from high resource to low resource domains. Extensive experiments show that our model achieves consistently higher accuracy than the single-task CWS and other transfer learning baselines, especially when there is a large disparity between source and target domains.

2017

Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media
Bill Y. Lin | Frank Xu | Zhiyi Luo | Kenny Zhu
Proceedings of the 3rd Workshop on Noisy User-generated Text

In this paper, we present our multi-channel neural architecture for recognizing emerging named entity in social media messages, which we applied in the Novel and Emerging Named Entity Recognition shared task at the EMNLP 2017 Workshop on Noisy User-generated Text (W-NUT). We propose a novel approach, which incorporates comprehensive word representations with multi-channel information and Conditional Random Fields (CRF) into a traditional Bidirectional Long Short-Term Memory (BiLSTM) neural network without using any additional hand-craft features such as gazetteers. In comparison with other systems participating in the shared task, our system won the 2nd place.

2016

Towards Non-projective High-Order Dependency Parser
Wenjing Fang | Kenny Zhu | Yizhong Wang | Jia Tan
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

This paper presents a novel high-order dependency parsing framework that targets non-projective treebanks. It imitates how a human parses sentences in an intuitive way. At every step of the parse, it determines which word is the easiest to process among all the remaining words, identifies its head word and then folds it under the head word. Further, this work is flexible enough to be augmented with other parsing techniques.

Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
Shi Feng | Shujie Liu | Nan Yang | Mu Li | Ming Zhou | Kenny Q. Zhu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In neural machine translation, the attention mechanism facilitates the translation process by producing a soft alignment between the source sentence and the target sentence. However, without dedicated distortion and fertility models seen in traditional SMT systems, the learned alignment may not be accurate, which can lead to low translation quality. In this paper, we propose two novel models to improve attention-based neural machine translation. We propose a recurrent attention mechanism as an implicit distortion model, and a fertility conditioned decoder as an implicit fertility model. We conduct experiments on large-scale Chinese–English translation tasks. The results show that our models significantly improve both the alignment and translation quality compared to the original attention mechanism and several other variations.

2015

Inferring Binary Relation Schemas for Open Information Extraction
Kangqi Luo | Xusheng Luo | Kenny Zhu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2013

Data-Driven Metaphor Recognition and Explanation
Hongsong Li | Kenny Q. Zhu | Haixun Wang
Transactions of the Association for Computational Linguistics, Volume 1

Recognizing metaphors and identifying the source-target mappings is an important task as metaphorical text poses a big challenge for machine reading. To address this problem, we automatically acquire a metaphor knowledge base and an isA knowledge base from billions of web pages. Using the knowledge bases, we develop an inference mechanism to recognize and explain the metaphors in the text. To our knowledge, this is the first purely data-driven approach of probabilistic metaphor acquisition, recognition, and explanation. Our results shows that it significantly outperforms other state-of-the-art methods in recognizing and explaining metaphors.

Co-authors

Bill Yuchen Lin 4

Zhiling Zhang 4

Chunhao Zhang 3

Shanshan Huang 2

Theron S. Wang 2

Fengjiao Chen 1

Tuan Minh Dang 1

Seung-Won Hwang 1

Hridayesh Lekhak 1

Shaodian Zhang 1

Venues