Feng Zhang

2025

Fine-grained intent detection involves identifying a large number of classes with subtle variations. Recently, generating pseudo samples via large language models has attracted increasing attention to alleviate the data scarcity caused by emerging new intents. However, these methods generate samples for each class independently and neglect the relationships between classes, leading to ambiguity in pseudo samples, particularly for fine-grained labels. And, they typically rely on one-time generation and overlook feedback from pseudo samples. In this paper, we propose an iterative differential generation framework with contrastive feedback to generate high-quality pseudo samples and accurately capture the crucial nuances in target class distribution. Specifically, we propose differential guidelines that include potential ambiguous labels to reduce confusion for similar labels. Then we conduct rubric-driven refinement, ensuring the validity and diversity of pseudo samples. Finally, despite one generation, we propose to iteratively generate new samples with contrastive feedback to achieve accurate identification and distillation of target knowledge. Extensive experiments in zero/few-shot and full-shot settings on three datasets verify the effectiveness of our method.

2024

pdf bib abs
Xinference: Making Large Model Serving Easy
Weizheng Lu | Lingfeng Xiong | Feng Zhang | Xuye Qin | Yueguo Chen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The proliferation of open-source large models necessitates dedicated tools for deployment and accessibility. To mitigate the complexities of model serving, we develop Xinference, an open-source library designed to simplify the deployment and management of large models. Xinference effectively simplifies deployment complexities for users by (a) preventing users from writing code and providing built-in support for various models and OpenAI-compatible APIs; (b) enabling full model serving lifecycle management; (c) guaranteeing efficient and scalable inference and achieving high throughput and low latency. In comparative experiments with similar products like BentoML and Ray Serve, Xinference outperforms these tools and offers superior ease of use.Xinference is available at https://github.com/xorbitsai/inference.

Evaluation is the baton for the development of large language models. Current evaluations typically employ a single-item assessment paradigm for each atomic test objective, which struggle to discern whether a model genuinely possesses the required capabilities or merely memorizes/guesses the answers to specific questions. To this end, this paper proposes a novel evaluation framework referred to as StructEval. Starting from an atomic test objective, StructEval deepens and broadens the evaluation by conducting a structured assessment across multiple cognitive levels and critical concepts, and therefore offers a comprehensive, robust and consistent evaluations for large language models. Experiments on three widely-used benchmarks demonstrate that StructEval serves as a reliable tool for resisting the risk of data contamination, and reducing the interference of potential biases, thereby providing a more reliable and consistent conclusion regarding model capabilities. Our framework also sheds light on the design of future principled and trustworthy LLM evaluation protocols.

pdf bib
SELP: A Semantically-Driven Approach for Separated and Accurate Class Prototypes in Few-Shot Text Classification
Wenxin Liang | Tingyu Zhang | Han Liu | Feng Zhang
Findings of the Association for Computational Linguistics: ACL 2024

Intent detection aims to identify user goals from utterances, and is a ubiquitous step towards the satisfaction of user desired needs in many interaction systems. As dynamic and varied intents arise, models that are capable of identifying new intents promptly are required. However, existing studies usually fine-tune discriminative models on the specific defined intent classes, precluding them from being directly adopted to new intent domains. In this paper, we introduce a generative pre-trained intent model that can recognize new intents from different domains in low-resource scenarios. We reformulate intent detection into a generation task and design descriptive and regularized instructions to guide the model effectively to detect new intents in open domains with no parameter updates. To validate the proposed method, we introduce a new intent detection benchmark, including the Meta-Intent Dataset and three types of representative evaluation settings. We conduct extensive experiments which demonstrate that our method outperforms a range of strong baselines that needs further fine-tuning or domain-specific samples.

Few-shot intent detection is a challenging task, particularly in scenarios involving multiple labels and diverse domains. This paper presents a novel prototype learning approach that combines the label synset augmentation and the coarse-to-fine prototype distillation for multi-label few-shot intent detection. To tackle the data scarcity issue and the lack of information for unseen domains, we propose to enhance the representations of utterances with label synset augmentation and refine the prototypes by distilling the coarse domain knowledge from a universal teacher model. To solve the multilingual intent detection in real-world dialogue systems, we fine-tune a cross-lingual teacher model to make our method fast adapt to different languages and re-annotate two non-English task-oriented dialogue datasets CrossWOZ and JMultiWOZ in multi-label form. Experimental results on one English and two non-English datasets demonstrate that our approach significantly outperforms existing methods in terms of accuracy and generalization across different domains.

2023

pdf bib abs
Dual Class Knowledge Propagation Network for Multi-label Few-shot Intent Detection
Feng Zhang | Wei Chen | Fei Ding | Tengjiao Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multi-label intent detection aims to assign multiple labels to utterances and attracts increasing attention as a practical task in task-oriented dialogue systems. As dialogue domains change rapidly and new intents emerge fast, the lack of annotated data motivates multi-label few-shot intent detection. However, previous studies are confused by the identical representation of the utterance with multiple labels and overlook the intrinsic intra-class and inter-class interactions. To address these two limitations, we propose a novel dual class knowledge propagation network in this paper. In order to learn well-separated representations for utterances with multiple intents, we first introduce a label-semantic augmentation module incorporating class name information. For better consideration of the inherent intra-class and inter-class relations, an instance-level and a class-level graph neural network are constructed, which not only propagate label information but also propagate feature structure. And we use a simple yet effective method to predict the intent count of each utterance. Extensive experimental results on two multi-label intent datasets have demonstrated that our proposed method outperforms strong baselines by a large margin.

We introduce the task of correcting named entity recognition (NER) errors without re-training model. After an NER model is trained and deployed in production,it makes prediction errors, which usually need to be fixed quickly. To address this problem, we firstly construct a gazetteer containing named entities and corresponding possible entity types. And then, we propose type enhanced BERT (TyBERT),a method that integrates the named entity’s type information into BERT by an adapter layer. When errors are identified, we can repair the model by updating the gazetteer. In other words, the gazetteer becomes a trigger to control NER model’s output. The experiment results in multiple corpus show the effectiveness of our method, which outperforms strong baselines.x

2021

pdf bib abs
PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction
Shulin Liu | Tao Yang | Tianchi Yue | Feng Zhang | Di Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Chinese spelling correction (CSC) is a task to detect and correct spelling errors in texts. CSC is essentially a linguistic problem, thus the ability of language understanding is crucial to this task. In this paper, we propose a Pre-trained masked Language model with Misspelled knowledgE (PLOME) for CSC, which jointly learns how to understand language and correct spelling errors. To this end, PLOME masks the chosen tokens with similar characters according to a confusion set rather than the fixed token “[MASK]” as in BERT. Besides character prediction, PLOME also introduces pronunciation prediction to learn the misspelled knowledge on phonic level. Moreover, phonological and visual similarity knowledge is important to this task. PLOME utilizes GRU networks to model such knowledge based on characters’ phonics and strokes. Experiments are conducted on widely used benchmarks. Our method achieves superior performance against state-of-the-art approaches by a remarkable margin. We release the source code and pre-trained model for further use by the community (https://github.com/liushulinle/PLOME).

pdf bib abs
Concept-Based Label Embedding via Dynamic Routing for Hierarchical Text Classification
Xuepeng Wang | Li Zhao | Bing Liu | Tao Chen | Feng Zhang | Di Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Hierarchical Text Classification (HTC) is a challenging task that categorizes a textual description within a taxonomic hierarchy. Most of the existing methods focus on modeling the text. Recently, researchers attempt to model the class representations with some resources (e.g., external dictionaries). However, the concept shared among classes which is a kind of domain-specific and fine-grained information has been ignored in previous work. In this paper, we propose a novel concept-based label embedding method that can explicitly represent the concept and model the sharing mechanism among classes for the hierarchical text classification. Experimental results on two widely used datasets prove that the proposed model outperforms several state-of-the-art methods. We release our complementary resources (concepts and definitions of classes) for these two datasets to benefit the research on HTC.

This paper introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities. Compared to most previous publicly available text understanding systems and tools, TexSmart holds some unique features. First, the NER function of TexSmart supports over 1,000 entity types, while most other public tools typically support several to (at most) dozens of entity types. Second, TexSmart introduces new semantic analysis functions like semantic expansion and deep semantic representation, that are absent in most previous systems. Third, a spectrum of algorithms (from very fast algorithms to those that are relatively slow but more accurate) are implemented for one function in TexSmart, to fulfill the requirements of different academic and industrial applications. The adoption of unsupervised or weakly-supervised algorithms is especially emphasized, with the goal of easily updating our models to include fresh data with less human annotation efforts.

pdf bib abs
An Explicit-Joint and Supervised-Contrastive Learning Framework for Few-Shot Intent Classification and Slot Filling
Han Liu | Feng Zhang | Xiaotong Zhang | Siyang Zhao | Xianchao Zhang
Findings of the Association for Computational Linguistics: EMNLP 2021

Intent classification (IC) and slot filling (SF) are critical building blocks in task-oriented dialogue systems. These two tasks are closely-related and can flourish each other. Since only a few utterances can be utilized for identifying fast-emerging new intents and slots, data scarcity issue often occurs when implementing IC and SF. However, few IC/SF models perform well when the number of training samples per class is quite small. In this paper, we propose a novel explicit-joint and supervised-contrastive learning framework for few-shot intent classification and slot filling. Its highlights are as follows. (i) The model extracts intent and slot representations via bidirectional interactions, and extends prototypical network to achieve explicit-joint learning, which guarantees that IC and SF tasks can mutually reinforce each other. (ii) The model integrates with supervised contrastive learning, which ensures that samples from same class are pulled together and samples from different classes are pushed apart. In addition, the model follows a not common but practical way to construct the episode, which gets rid of the traditional setting with fixed way and shot, and allows for unbalanced datasets. Extensive experiments on three public datasets show that our model can achieve promising performance.

pdf bib abs
KLMo: Knowledge Graph Enhanced Pretrained Language Model with Fine-Grained Relationships
Lei He | Suncong Zheng | Tao Yang | Feng Zhang
Findings of the Association for Computational Linguistics: EMNLP 2021

Interactions between entities in knowledge graph (KG) provide rich knowledge for language representation learning. However, existing knowledge-enhanced pretrained language models (PLMs) only focus on entity information and ignore the fine-grained relationships between entities. In this work, we propose to incorporate KG (including both entities and relations) into the language learning process to obtain KG-enhanced pretrained Language Model, namely KLMo. Specifically, a novel knowledge aggregator is designed to explicitly model the interaction between entity spans in text and all entities and relations in a contextual KG. An relation prediction objective is utilized to incorporate relation information by distant supervision. An entity linking objective is further utilized to link entity spans in text to entities in KG. In this way, the structured knowledge can be effectively integrated into language representations. Experimental results demonstrate that KLMo achieves great improvements on several knowledge-driven tasks, such as entity typing and relation classification, comparing with the state-of-the-art knowledge-enhanced PLMs.

2019

pdf bib abs
Event Detection without Triggers
Shulin Liu | Yang Li | Feng Zhang | Tao Yang | Xinpeng Zhou
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The goal of event detection (ED) is to detect the occurrences of events and categorize them. Previous work solved this task by recognizing and classifying event triggers, which is defined as the word or phrase that most clearly expresses an event occurrence. As a consequence, existing approaches required both annotated triggers and event types in training data. However, triggers are nonessential to event detection, and it is time-consuming for annotators to pick out the “most clearly” word from a given sentence, especially from a long sentence. The expensive annotation of training corpus limits the application of existing approaches. To reduce manual effort, we explore detecting events without triggers. In this work, we propose a novel framework dubbed as Type-aware Bias Neural Network with Attention Mechanisms (TBNNAM), which encodes the representation of a sentence based on target event types. Experimental results demonstrate the effectiveness. Remarkably, the proposed approach even achieves competitive performances compared with state-of-the-arts that used annotated triggers.