Kezhi Mao


2024

pdf bib
PromptExplainer: Explaining Language Models through Prompt-based Learning
Zijian Feng | Hanzhang Zhou | Zixiao Zhu | Kezhi Mao
Findings of the Association for Computational Linguistics: EACL 2024

Pretrained language models have become workhorses for various natural language processing (NLP) tasks, sparking a growing demand for enhanced interpretability and transparency. However, prevailing explanation methods, such as attention-based and gradient-based strategies, largely rely on linear approximations, potentially causing inaccuracies such as accentuating irrelevant input tokens. To mitigate the issue, we develop PromptExplainer, a novel method for explaining language models through prompt-based learning. PromptExplainer aligns the explanation process with the masked language modeling (MLM) task of pretrained language models and leverages the prompt-based learning framework for explanation generation. It disentangles token representations into the explainable embedding space using the MLM head and extracts discriminative features with a verbalizer to generate class-dependent explanations. Extensive experiments demonstrate that PromptExplainer significantly outperforms state-of-the-art explanation methods.

pdf bib
EDEntail: An Entailment-based Few-shot Text Classification with Extensional Definition
Zixiao Zhu | Junlang Qian | Zijian Feng | Hanzhang Zhou | Kezhi Mao
Findings of the Association for Computational Linguistics: NAACL 2024

Few-shot text classification has seen significant advancements, particularly with entailment-based methods, which typically use either class labels or intensional definitions of class labels in hypotheses for label semantics expression. In this paper, we propose EDEntail, a method that employs extensional definition (EDef) of class labels in hypotheses, aiming to express the semantics of class labels more explicitly. To achieve the above goal, we develop an algorithm to gather and select extensional descriptive words of class labels and then order and format them into a sequence to form hypotheses. Our method has been evaluated and compared with state-of-the-art models on five classification datasets. The results demonstrate that our approach surpasses the supervised-learning methods and prompt-based methods under the few-shot setting, which underlines the potential of using an extensional definition of class labels for entailment-based few-shot text classification. Our code is available at https://github.com/MidiyaZhu/EDEntail.

pdf bib
FreeCtrl: Constructing Control Centers with Feedforward Layers for Learning-Free Controllable Text Generation
Zijian Feng | Hanzhang Zhou | Kezhi Mao | Zixiao Zhu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Controllable text generation (CTG) seeks to craft texts adhering to specific attributes, traditionally employing learning-based techniques such as training, fine-tuning, or prefix-tuning with attribute-specific datasets. These approaches, while effective, demand extensive computational and data resources. In contrast, some proposed learning-free alternatives circumvent learning but often yield inferior results, exemplifying the fundamental machine learning trade-off between computational expense and model efficacy. To overcome these limitations, we propose FreeCtrl, a learning-free approach that dynamically adjusts the weights of selected feedforward neural network (FFN) vectors to steer the outputs of large language models (LLMs). FreeCtrl hinges on the principle that the weights of different FFN vectors influence the likelihood of different tokens appearing in the output. By identifying and adaptively adjusting the weights of attribute-related FFN vectors, FreeCtrl can control the output likelihood of attribute keywords in the generated content. Extensive experiments on single- and multi-attribute control reveal that the learning-free FreeCtrl outperforms other learning-free and learning-based methods, successfully resolving the dilemma between learning costs and model performance.

pdf bib
LLMs Learn Task Heuristics from Demonstrations: A Heuristic-Driven Prompting Strategy for Document-Level Event Argument Extraction
Hanzhang Zhou | Junlang Qian | Zijian Feng | Lu Hui | Zixiao Zhu | Kezhi Mao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this study, we explore in-context learning (ICL) in document-level event argument extraction (EAE) to alleviate the dependency on large-scale labeled data for this task. We introduce the Heuristic-Driven Link-of-Analogy (HD-LoA) prompting tailored for the EAE task. Specifically, we hypothesize and validate that LLMs learn task-specific heuristics from demonstrations in ICL. Building upon this hypothesis, we introduce an explicit heuristic-driven demonstration construction approach, which transforms the haphazard example selection process into a systematic method that emphasizes task heuristics. Additionally, inspired by the analogical reasoning of human, we propose the link-of-analogy prompting, which enables LLMs to process new situations by drawing analogies to known situations, enhancing their performance on unseen classes beyond limited ICL examples. Experiments show that our method outperforms existing prompting methods and few-shot supervised learning methods on document-level EAE datasets. Additionally, the HD-LoA prompting shows effectiveness in other tasks like sentiment analysis and natural language inference, demonstrating its broad adaptability.

2023

pdf bib
Closed Boundary Learning for Classification Tasks with the Universum Class
Hanzhang Zhou | Zijian Feng | Kezhi Mao
Findings of the Association for Computational Linguistics: EMNLP 2023

The Universum class, often known as the *other* class or the*miscellaneous* class, is defined as a collection of samples that do not belong to any class of interest. It is a typical class that exists in many classification-based tasks in NLP, such as relation extraction, named entity recognition, sentiment analysis, etc. The Universum class exhibits very different properties, namely heterogeneity and lack of representativeness in training data; however, existing methods often treat the Universum class equally with the classes of interest, leading to problems such as overfitting, misclassification, and diminished model robustness. In this work, we propose a closed boundary learning method that applies closed decision boundaries to classes of interest and designates the area outside all closed boundaries in the feature space as the space of the Universum class. Specifically, we formulate closed boundaries as arbitrary shapes, propose the inter-class rule-based probability estimation for the Universum class to cater to its unique properties, and propose a boundary learning loss to adjust decision boundaries based on the balance of misclassified samples inside and outside the boundary. In adherence to the natural properties of the Universum class, our method enhances both accuracy and robustness of classification models, demonstrated by improvements on six state-of-the-art works across three different tasks. Our code is available at https://github.com/hzzhou01/Closed-Boundary-Learning.

2022

pdf bib
Document-Level Event Argument Extraction by Leveraging Redundant Information and Closed Boundary Loss
Hanzhang Zhou | Kezhi Mao
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In document-level event argument extraction, an argument is likely to appear multiple times in different expressions in the document. The redundancy of arguments underlying multiple sentences is beneficial but is often overlooked. In addition, in event argument extraction, most entities are regarded as class “others”, i.e. Universum class, which is defined as a collection of samples that do not belong to any class of interest. Universum class is composed of heterogeneous entities without typical common features. Classifiers trained by cross entropy loss could easily misclassify the Universum class because of their open decision boundary. In this paper, to make use of redundant event information underlying a document, we build an entity coreference graph with the graph2token module to produce a comprehensive and coreference-aware representation for every entity and then build an entity summary graph to merge the multiple extraction results. To better classify Universum class, we propose a new loss function to build classifiers with closed boundaries. Experimental results show that our model outperforms the previous state-of-the-art models by 3.35% in F1-score.

2019

pdf bib
Improving Relation Extraction with Knowledge-attention
Pengfei Li | Kezhi Mao | Xuefeng Yang | Qi Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

While attention mechanisms have been proven to be effective in many NLP tasks, majority of them are data-driven. We propose a novel knowledge-attention encoder which incorporates prior knowledge from external lexical resources into deep neural networks for relation extraction task. Furthermore, we present three effective ways of integrating knowledge-attention with self-attention to maximize the utilization of both knowledge and data. The proposed relation extraction system is end-to-end and fully attention-based. Experiment results show that the proposed knowledge-attention mechanism has complementary strengths with self-attention, and our integrated models outperform existing CNN, RNN, and self-attention based models. State-of-the-art performance is achieved on TACRED, a complex and large-scale relation extraction dataset.