Wang Mingwen

Also published as: 明文


2024

pdf bib
基于两阶段提示学习的少样本命名实体识别(Two-Stage Prompt Learning for Few-Shot Named Entity Recognition)
Shao Jiaxing (邵佳兴) | Huang Qi (黄琪) | Xiao Cong (肖聪) | Liu Jing (刘璟) | Luo Wenbing (罗文兵) | Wang Mingwen (王明文)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“少样本命名实体识别旨在用少量的标注数据来识别命名实体。近年来受提示学习在少样本场景中表现良好性能的启发,本文探索了基于提示的少样本命名实体识别的方法。已有的基于提示学习的方法是通过列举所有可能的跨度来进行实体识别,这导致了计算成本高以及对实体边界信息未充分利用的问题。本文提出一种基于提示学习的两阶段框架TSP-Few,在不使用源域数据的情况下,进行少样本命名实体识别。第一阶段对种子跨度进行增强、过滤和扩展,其中种子增强模块能够让种子跨度捕获到更丰富的语义信息,种子过滤器能够减少大量的无关跨度,种子扩展模块能够充分利用实体的边界信息,为实体类型分类提供高质量的候选实体跨度。第二阶段利用提示学习方法预测候选跨度的相应类别。此外,为了缓解跨度检测阶段的错误累积,在实体分类阶段引入了负采样策略。跨度检测和实体类型分类任务的独立训练更容易在少样本情况下取得优异的性能。在三个基准数据集上的实验表明,与先进的方法相比,本文提出的方法在性能上有了进一步的提升,并且实验结果也表明了该文模型各个模块的有效性。”

pdf bib
基于文本风格迁移的中文性别歧视文本去毒研究(Research on detoxification of Chinese sexist texts based on text style transfer)
Peng Jian (彭健) | Zuo Jiali (左家莉) | Tan Jingxuan (谭景璇) | Wan Jianyi (万剑怡) | Wang Mingwen (王明文)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“网络社交媒体平台存在一定程度的性别歧视言论,阻碍了互联网健康和社会文明发展。文本风格迁移技术可以减轻文本中的性别歧视,在英语等语言上已有不少研究。但在中文领域,由于缺乏数据集而导致相关研究较少。此外,由于中文语义信息丰富、语言表达多样而导致性别歧视言论毒性的表现形式多样,现有的方法多采用单一文本风格迁移模型因而效果不佳。因此,本文提出了一个基于文本风格迁移的中文性别歧视文本去毒框架,该框架首先根据毒性的表现形式对文本进行分类,进而根据文本毒性表现形式的不同采用不同的处理方式,我们还引入了大语言模型(LLM)构建歧视词词典。实验表明,本文提出的模型能有效地处理中文文本中的性别歧视问题。”

pdf bib
PPDAC: A Plug-and -Play Data Augmentation Component for Few-shot Extractive Question Answering
Huang Qi | Fu Han | Luo Wenbin | Wang Mingwen | Luo Kaiwei
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“Extractive Question Answering (EQA) in the few-shot learning scenario is one of the most chal-lenging tasks of Machine Reading Comprehension (MRC). Some previous works employ exter-nal knowledge for data augmentation to improve the performance of few-shot extractive ques-tion answering. However, there are not always available external knowledge or language- anddomain-specific NLP tools to deal with external knowledge such as part-of-speech taggers, syn-tactic parsers, and named-entity recognizers. In this paper, we present a novel Plug-and-PlayData Augmentation Component (PPDAC) for the few-shot extractive question answering, whichincludes a paraphrase generator and a paraphrase selector. Specifically, we generate multipleparaphrases of the question in the (question, passage, answer) triples using the paraphrase gener-ator and then obtain highly similar statements via paraphrase selector to form more training datafor fine-tuning. Extensive experiments on multiple EQA datasets show that our proposed plug-and-play data augmentation component significantly improves question-answering performance,and consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.”

pdf bib
基于增量预训练与外部知识的古文历史事件检测
Kang Wenjun (康文军) | Zuo Jiali (左家莉) | Hu Yiyu (胡益裕) | Wang Mingwen (王明文)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“古文历史事件检测任务旨在识别文本中的事件触发词和类型。为了解决传统pipeline方法容易产生级联错误传播,以及大多数事件检测方法仅依赖句子层面信息的问题,本文提出了一种结合外部信息和全局对应矩阵的联合抽取模型EIGC,以实现触发词和事件类型的精确抽取。此外,本文还整理了一个包含“二十四史”等古汉语文献的数据集,共计约97万条古汉语文本,并利用该文本对BERT-Ancient-Chinese进行增量预训练。最终,本文所提出的模型在三个任务上的总F1值达到了76.2%,验证了该方法的有效性。”