Peijie Huang (黄沛杰)

Peijie Huang

Also published as: 沛杰黄

2025

pdf bib abs
ECLM: Entity Level Language Model for Spoken Language Understanding with Chain of Intent
Shangjian Yin | Peijie Huang | JiaTian Chen | Haojing Huang | Yuhong Xu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) have demonstrated impressive capabilities in language generation and general task performance. However, their application to spoken language understanding (SLU) remains challenging, particularly for token-level tasks, where the autoregressive nature of LLMs often leads to misalignment issues. They also struggle to capture nuanced interrelations in semantic-level tasks through direct fine-tuning alone. To address these challenges, we propose the Entity-level Language Model (ECLM) framework, which reformulates slot-filling as an entity recognition task and introduces a novel concept, Chain of Intent, to enable step-by-step multi-intent recognition. Experimental results show that ECLM significantly outperforms strong baselines such as Uni-MIS, achieving gains of 3.7% on MixATIS and 3.1% on MixSNIPS. Compared to standard supervised fine-tuning of LLMs, ECLM further achieves improvements of 8.5% and 21.2% on these datasets, respectively. Our code is available at https://github.com/SJY8460/ECLM.

pdf bib abs
MIDLM: Multi-Intent Detection with Bidirectional Large Language Models
Shangjian Yin | Peijie Huang | Yuhong Xu
Proceedings of the 31st International Conference on Computational Linguistics

Decoder-only Large Language Models (LLMs) have demonstrated exceptional performance in language generation, exhibiting broad capabilities across various tasks. However, the application to label-sensitive language understanding tasks remains challenging due to the limitations of their autoregressive architecture, which restricts the sharing of token information within a sentence. In this paper, we address the Multi-Intent Detection (MID) task and introduce MIDLM, a bidirectional LLM framework that incorporates intent number detection and multi-intent selection. This framework allows autoregressive LLMs to leverage bidirectional information awareness through post-training, eliminating the need for training the models from scratch. Comprehensive evaluations across 8 datasets show that MIDLM consistently outperforms both existing vanilla models and pretrained baselines, demonstrating its superior performance in the MID task.

pdf bib abs
Synergistic Augmentation: Enhancing Cross-Domain Zero-Shot Slot Filling with Small Model-Assisted Large Language Models
Weizhen Li | Junbao Huang | Peijie Huang | Yuhong Xu | Jiekun Fan
Findings of the Association for Computational Linguistics: ACL 2025

In real-world scenarios, cross-domain slot filling in spoken language understanding remains a significant challenge due to data scarcity. Previous works exhibit limited generalization ability in the target domain, demonstrating effective knowledge transfer only on seen slots while performing poorly on unseen slots. Although large language models (LLMs) can alleviate this issue to some extent, they underperform on seen slots compared to small models. To address these challenges, we introduce a novel framework that harnesses the power of a small model to augment the inferential capabilities of LLMs without additional training. Initially, we utilize target domain samples synthesized by LLMs as pre-calculated demonstrations, which are curated and chosen using confidence metrics derived from a small model. We further extract slot predictions from the small model to fully exploit its robust learning of familiar slots. Finally, during the inference process for test inputs, we integrate these demonstrations and slot prediction insights as references to enhance the slot filling performance of LLMs. Experiments on a slot filling dataset and a NER dataset including eight cross-domain settings show our framework achieves the best results. Our codes are publicly available at https://github.com/SIGSDSscau/SLSF.

pdf bib abs
From Noise to Clarity: Filtering Real and LLM-Generated Samples for Enhanced Intent Detection
Junbao Huang | Weizhen Li | Peijie Huang | Yuhong Xu
Findings of the Association for Computational Linguistics: EMNLP 2025

In dialogue intent detection, the challenge of acquiring sufficient corpora and the high cost of manual annotation often lead to incorrectly labeled or unrepresentative samples, which can hinder the generalization ability of classification models. Additionally, as using large language models for generating synthetic samples for data augmentation becomes more common, these synthetic samples may exacerbate the problem by introducing additional noise due to the models’ limited prior knowledge. To address this challenge, this paper proposes an interpretable Sample Filter by Topic Modeling (SFTM) framework. By evaluating the diversity and authenticity of the samples, SFTM effectively reduces the quantity of real and synthetic samples while improving the performance of the classification models. Our codes are publicly available at https://github.com/gumbouh/SFTM.

2024

pdf bib abs
DMIN: A Discourse-specific Multi-granularity Integration Network for Conversational Aspect-based Sentiment Quadruple Analysis
Peijie Huang | Xisheng Xiao | Yuhong Xu | Jiawei Chen
Findings of the Association for Computational Linguistics: ACL 2024

Conversational Aspect-based Sentiment Quadruple Analysis (DiaASQ) aims to extract fine-grained sentiment quadruples from dialogues. Previous research has primarily concentrated on enhancing token-level interactions, still lacking in sufficient modeling of the discourse structure information in dialogue. Firstly, it does not incorporate interactions among different utterances in the encoding stage, resulting in a limited token-level context understanding for subsequent modules. Secondly, it ignores the critical fact that discourse information is naturally organized at the utterance level and learning it solely at the token level is incomplete. In this work, we strengthen the token-level encoder by utilizing a discourse structure called “thread” and graph convolutional networks to enhance the token interaction among different utterances. Moreover, we propose an utterance-level encoder to learn the structured speaker and reply information, providing a macro understanding of dialogue discourse. Furthermore, we introduce a novel Multi-granularities Integrator to integrate token-level and utterance-level representations, resulting in a comprehensive and cohesive dialogue contextual understanding. Experiments on two datasets demonstrate that our model achieves state-of-the-art performance. Our codes are publicly available at https://github.com/SIGSDSscau/DMIN.

pdf bib abs
Logits Reranking via Semantic Labels for Hard Samples in Text Classification
Peijie Huang | Junbao Huang | Yuhong Xu | Weizhen Li | Xisheng Xiao
Findings of the Association for Computational Linguistics: EMNLP 2024

Pre-trained Language Models (PLMs) have achieved significant success in text classification. However, they still face challenges with hard samples, which refer to instances where the model exhibits diminished confidence in distinguishing new samples. Existing research has addressed related issues, but often overlooks the semantic information inherent in the labels, treating them merely as one-hot vectors. In this paper, we propose Logits Reranking via Semantic Labels (LRSL), a model-agnostic post-processing method that leverages label semantics and auto detection of hard samples to improve classification accuracy. LRSL automatically identifies hard samples, which are then jointly processed by MLP-based and Similarity-based approaches. Applied only during inference, LRSL operates solely on classification logits, reranking them based on semantic similarities without interfering with the model’s training process. The experiments demonstrate the effectiveness of our method, showing significant improvements across different PLMs. Our codes are publicly available at https://github.com/SIGSDSscau/LRSL.

2023

pdf bib abs
基于多意图融合框架的联合意图识别和槽填充(A Multi-Intent Fusion Framework for Joint Intent Detection and Slot Filling)
Shangjian Yin (尹商鉴) | Peijie Huang (黄沛杰) | Dongzhu Liang (梁栋柱) | Zhuoqi He (何卓棋) | Qianer Li (黎倩尔) | Yuhong Xu (徐禹洪)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“近年来,多意图口语理解(SLU)已经成为自然语言处理领域的研究热点。当前先进的多意图SLU模型采用图-交互式框架进行联合多意图识别和槽位填充,能够有效地捕捉到词元级槽位填充任务的细粒度意图信息,取得了良好的性能。但是,它忽略了联合作用下的意图所包含的丰富信息,没有充分利用多意图信息对槽填充任务进行指引。为此,本文提出了一种基于多意图融合框架(MIFF)的联合多意图识别和槽填充框架,使得模型能够在准确地识别不同意图的同时,利用意图信息为槽填充任务提供更充分的指引。我们在MixATIS和MixSNIPS两个公共数据集上进行了实验,结果表明,我们的模型在性能和效率方面均超过了当前最先进的方法,同时能够有效从单领域数据集泛化到多领域数据集上。”

pdf bib abs
基于互信息最大化和对比损失的多模态对话情绪识别模型(Multimodal Emotion Recognition in Conversation with Mutual Information Maximization and Contrastive Loss)
Qianer Li (黎倩尔) | Peijie Huang (黄沛杰) | Jiawei Chen (陈佳炜) | Jialin Wu (吴嘉林) | Yuhong Xu (徐禹洪) | Peiyuan Lin (林丕源)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“多模态的对话情绪识别(emotion recognition in conversation,ERC)是构建情感对话系统的关键。近年来基于图的融合方法在会话中动态聚合多模态上下文特征,提高了模型在多模态对话情绪识别方面的性能。然而,这些方法都没有充分保留和利用输入数据中的有价值的信息。具体地说,它们都没有保留从输入到融合结果的任务相关信息,并且忽略了标签本身蕴含的信息。本文提出了一种基于互信息最大化和对比损失的多模态对话情绪识别模型MMIC来解决上述的问题。模型通过在输入级和融合级上分级最大化模态之间的互信息(mutual information),使任务相关信息在融合过程中得以保存,从而生成更丰富的多模态表示。本文还在基于图的动态融合网络中引入了监督对比学习(supervised contrastive learning),通过充分利用标签蕴含的信息,使不同情绪相互排斥,增强了模型识别相似情绪的能力。在两个英文和一个中文的公共数据集上的大量实验证明了所提出模型的有效性和优越性。此外,在所提出模型上进行的案例探究有效地证实了模型可以有效保留任务相关信息,更好地区分出相似的情绪。消融实验和可视化结果证明了模型中每个模块的有效性。”

2021

pdf bib abs
面向中文口语理解的基于依赖引导的字特征槽填充模型(A Dependency-Guided Character-Based Slot Filling Model for Chinese Spoken Language Understanding)
Zhanbiao Zhu (朱展标) | Peijie Huang (黄沛杰) | Yexing Zhang (张业兴) | Shudong Liu (刘树东) | Hualin Zhang (张华林) | Junyao Huang (黄均曜)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

意图识别和槽信息填充的联合模型将口语理解技术(Spoken Language Understanding)提升到了一个新的水平,但由于存在出现频率低或未见过的槽指称项(0 shot slot mentions),模型的序列标注性能受限,而且这些联合模型往往没有利用输入序列存在的语法知识信息。已有研究表明序列标注任务可以通过引入依赖树结构,辅助推断序列标注中槽的存在。在中文口语对话理解中,由于中文话语是一串字序列组成,输入话语的字和槽信息是一一对应的,因而槽信息填充模型往往是字特征模型。基于词的依赖树结构无法直接应用于基于字特征的槽填充模型。为了解决字词之间的矛盾,本文提出了一种基于字模型的依赖引导槽填充模型(dependency guided character-based slot filling model,DCSF),提供了一种简洁的方法解决将词级依赖树结构引入中文字特征模型的冲突,同时通过对话语中词汇内部关系进行建模,保留了词级上下文信息和分词信息。在公共基准语料库当SMP-ECDT和CrossWOZ上的实验结果表明,我们的模型优于比较模型,特别是在未见过的槽指称项和低资源情况下有很大的改进。

pdf bib abs
结合边界预测和动态模板方法的槽填充模型(Slot Filling Model with Boundary Prediction and Dynamic Template)
Zhanbiao Zhu (朱展标) | Peijie Huang (黄沛杰) | Yexing Zhang (张业兴) | Shudong Liu (刘树东) | Hualin Zhang (张华林) | Junyao Huang (黄均曜)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

意图识别和槽信息填充的联合模型将口语理解技术(Spoken language understandingSLU)提升到了一个新的水平,但是目前研究进展的模型通过话语上下文信息判断位置信息,缺少对槽信息标签之间位置信息的考虑,导致模型在槽位提取过程中容易发生边界错误,进而影响最终槽位提取表现。而且在槽信息提取任务中,槽指称项(Slot mentions)可能与正常表述话语并没有区别,特别是电影名字、歌曲名字等,模型容易受到槽指称项话语的干扰,因而无法在槽位提取中正确识别槽位边界。本文提出了一种面向口语理解的结合边界预测和动态模板的槽填充(Boundary-predictionand Dynamic-template Slot Filling BDSF)模型。该模型提供了一种联合预测边界信息的辅助任务,将位置信息引入到槽信息填充中,同时利用动态模版机制对话语句式建模,能够让模型聚焦于话语中的非槽指称项部分,避免了模型被槽指称项干扰,增强模型区分槽位边界的能力。在公共基准语料库CAIS和SMP-ECDT上的实验结果表明,我们的模型优于比较模型,特别是能够为槽标签预测模型提供准确的位置信息。

pdf bib abs
基于堆叠式注意力网络的复杂话语领域分类方法(Complex Utterance Domain Classification Using Stacked Attention Networks)
Chaojie Liang (梁超杰) | Peijie Huang (黄沛杰) | Jiande Ding (丁健德) | Jiankai Zhu (朱建恺) | Piyuan Lin (林丕源)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

话语领域分类(utterance domain classification UDC)是口语语言理解(spoken lan-guage understanding SLU)中语义分析的关键步骤。尽管带注意力机制的递归神经网络已经得到了广泛的应用,并将UDC的研究进展提高到了一个新的水平,但是对于复杂的话语,如长度较长的话语或带有逗号的复合句的话语,有效的UDC仍然是一个挑战。本文提出一种基于堆叠式注意力网络的话语领域分类方法SAN-DC(stacked attention networks-DC)。该模型综合了对口语话语多层次的语言特征的捕捉,增强对复杂话语的理解。首先在模型底层采用语境化词向量(contextualized word embedding)得到良好的词汇特征表达,并在词法层采用长短期记忆网络(long short-term memory)将话语编码为上下文向量表示。接着在语法级别上使用自注意力机制(self-attention mechanism)来捕捉特定领域的词依赖,然后使用词注意力(word-attention)层提取语义信息。最后使用残差连接(residual connection)将低层语言信息传递到高层,更好地实现多层语言信息的融合。本文在中文话语领域分类基准语料SMP-ECDT上验证所提出的方法的有效性。通过与研究进展的文本分类模型对比,本文的方法取得了较高的话语领域分类正确率。尤其是对于较为复杂的用户话语,本文提出的方法较研究进展方法的性能提升更为显著。

2020

pdf bib abs
一种结合话语伪标签注意力的人机对话意图分类方法(A Human-machine Dialogue Intent Classification Method using Utterance Pseudo Label Attention)
Jiande Ding (丁健德) | Peijie Huang (黄沛杰) | Jiabao Xu (许嘉宝) | Youming Peng (彭佑铭)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

在人机对话中,系统需要通过意图分类判断用户意图,再触发相应的业务类型。由于多轮人机对话具有口语化、长文本和特征稀疏等特点,现有的文本分类方法在人机对话意图分类上还存在较大困难。本文在层次注意力网络(hierarchical attention networks, HAN)基础上,提出了一种结合话语伪标签注意力的层次注意力网络模型PLA-HAN (HAN with utterance pseudo label attention)。PLA-HAN通过优选伪标签集、构建单句话语意图识别模型以及设计话语伪标签注意力机制,识别单句话语意图伪标签,并计算话语伪标签注意力。进而将单句话语伪标签注意力嵌入到HAN的层级结构中,与HAN中的句子级别注意力相融合。融合了单句话语意图信息的句子级注意力使模型整体性能得到进一步的提升。我们在中国中文信息学会主办的“客服领域用户意图分类评测比赛”的评测语料上进行实验,实验结果证明PLA-HAN模型取得了优于HAN等对比方法的意图分类性能。