2024
pdf
bib
abs
Breaking the Boundaries: A Unified Framework for Chinese Named Entity Recognition Across Text and Speech
Jinzhong Ning
|
Yuanyuan Sun
|
Bo Xu
|
Zhihao Yang
|
Ling Luo
|
Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2024
In recent years, with the vast and rapidly increasing amounts of spoken and textual data, Named Entity Recognition (NER) tasks have evolved into three distinct categories, i.e., text-based NER (TNER), Speech NER (SNER) and Multimodal NER (MNER). However, existing approaches typically require designing separate models for each task, overlooking the potential connections between tasks and limiting the versatility of NER methods. To mitigate these limitations, we introduce a new task named Integrated Multimodal NER (IMNER) to break the boundaries between different modal NER tasks, enabling a unified implementation of them. To achieve this, we first design a unified data format for inputs from different modalities. Then, leveraging the pre-trained MMSpeech model as the backbone, we propose an **I**ntegrated **M**ultimod**a**l **Ge**neration Framework (**IMAGE**), formulating the Chinese IMNER task as an entity-aware text generation task. Experimental results demonstrate the feasibility of our proposed IMAGE framework in the IMNER task. Our work in integrated multimodal learning in advancing the performance of NER may set up a new direction for future research in the field. Our source code is available at https://github.com/NingJinzhong/IMAGE4IMNER.
2023
pdf
bib
abs
OD-RTE: A One-Stage Object Detection Framework for Relational Triple Extraction
Jinzhong Ning
|
Zhihao Yang
|
Yuanyuan Sun
|
Zhizheng Wang
|
Hongfei Lin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The Relational Triple Extraction (RTE) task is a fundamental and essential information extraction task. Recently, the table-filling RTE methods have received lots of attention. Despite their success, they suffer from some inherent problems such as underutilizing regional information of triple. In this work, we treat the RTE task based on table-filling method as an Object Detection task and propose a one-stage Object Detection framework for Relational Triple Extraction (OD-RTE). In this framework, the vertices-based bounding box detection, coupled with auxiliary global relational triple region detection, ensuring that regional information of triple could be fully utilized. Besides, our proposed decoding scheme could extract all types of triples. In addition, the negative sampling strategy of relations in the training stage improves the training efficiency while alleviating the imbalance of positive and negative relations. The experimental results show that 1) OD-RTE achieves the state-of-the-art performance on two widely used datasets (i.e., NYT and WebNLG). 2) Compared with the best performing table-filling method, OD-RTE achieves faster training and inference speed with lower GPU memory usage. To facilitate future research in this area, the codes are publicly available at
https://github.com/NingJinzhong/ODRTE.
2022
pdf
bib
abs
Two Languages Are Better than One: Bilingual Enhancement for Chinese Named Entity Recognition
Jinzhong Ning
|
Zhihao Yang
|
Zhizheng Wang
|
Yuanyuan Sun
|
Hongfei Lin
|
Jian Wang
Proceedings of the 29th International Conference on Computational Linguistics
Chinese Named Entity Recognition (NER) has continued to attract research attention. However, most existing studies only explore the internal features of the Chinese language but neglect other lingual modal features. Actually, as another modal knowledge of the Chinese language, English contains rich prompts about entities that can potentially be applied to improve the performance of Chinese NER. Therefore, in this study, we explore the bilingual enhancement for Chinese NER and propose a unified bilingual interaction module called the Adapted Cross-Transformers with Global Sparse Attention (ACT-S) to capture the interaction of bilingual information. We utilize a model built upon several different ACT-Ss to integrate the rich English information into the Chinese representation. Moreover, our model can learn the interaction of information between bilinguals (inter-features) and the dependency information within Chinese (intra-features). Compared with existing Chinese NER methods, our proposed model can better handle entities with complex structures. The English text that enhances the model is automatically generated by machine translation, avoiding high labour costs. Experimental results on four well-known benchmark datasets demonstrate the effectiveness and robustness of our proposed model.
2021
pdf
bib
abs
面向法律文本的实体关系联合抽取算法(Joint Entity and Relation Extraction for Legal Texts)
Wenhui Song (宋文辉)
|
Xiang Zhou (周翔)
|
Ping Yang (杨萍)
|
Yuanyuan Sun (孙媛媛)
|
Liang Yang (杨亮)
|
Hongfei Lin (林鸿飞)
Proceedings of the 20th Chinese National Conference on Computational Linguistics
法律文本中包含的丰富信息可以通过结构化的实体关系三元组进行表示,便于法律知识的存储和查询。传统的流水线方法在自动抽取三元组时执行了大量冗余计算,造成了误差传播。而现有的联合学习方法无法适用于有大量重叠关系的法律文本,也并未关注语法结构信息对文本表示的增强,因此本文提出一种面向法律文本的实体关系联合抽取模型。该模型首先通过ON-LSTM注入语法信息,然后引入多头注意力机制分解重叠关系。相较于流水线和其他联合学习方法本文模型抽取效果最佳,在涉毒类法律文本数据集上抽取结果的F1值达到78.7%。
2020
pdf
bib
abs
Joint Entity and Relation Extraction for Legal Documents with Legal Feature Enhancement
Yanguang Chen
|
Yuanyuan Sun
|
Zhihao Yang
|
Hongfei Lin
Proceedings of the 28th International Conference on Computational Linguistics
In recent years, the plentiful information contained in Chinese legal documents has attracted a great deal of attention because of the large-scale release of the judgment documents on China Judgments Online. It is in great need of enabling machines to understand the semantic information stored in the documents which are transcribed in the form of natural language. The technique of information extraction provides a way of mining the valuable information implied in the unstructured judgment documents. We propose a Legal Triplet Extraction System for drug-related criminal judgment documents. The system extracts the entities and the semantic relations jointly and benefits from the proposed legal lexicon feature and multi-task learning framework. Furthermore, we manually annotate a dataset for Named Entity Recognition and Relation Extraction in Chinese legal domain, which contributes to training supervised triplet extraction models and evaluating the model performance. Our experimental results show that the legal feature introduction and multi-task learning framework are feasible and effective for the Legal Triplet Extraction System. The F1 score of triplet extraction finally reaches 0.836 on the legal dataset.
pdf
bib
abs
基于预训练语言模型的案件要素识别方法(A Method for Case Factor Recognition Based on Pre-trained Language Models)
Haishun Liu (刘海顺)
|
Lei Wang (王雷)
|
Yanguang Chen (陈彦光)
|
Shuchen Zhang (张书晨)
|
Yuanyuan Sun (孙媛媛)
|
Hongfei Lin (林鸿飞)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
案件要素识别指将案件描述中重要事实描述自动抽取出来,并根据领域专家设计的要素体系进行分类,是智慧司法领域的重要研究内容。基于传统神经网络的文本编码难以提取深层次特征,基于阈值的多标签分类难以捕获标签间依赖关系,因此本文提出了基于预训练语言模型的多标签文本分类模型。该模型采用以Layer-attentive策略进行特征融合的语言模型作为编码器,使用基于LSTM的序列生成模型作为解码器。在“CAIL2019”数据集上进行实验,该方法比基于循环神经网络的算法在F1值上最高可提升7.6%,在相同超参数设置下比基础语言模型(BERT)提升约3.2%。