Yanguang Chen


2020

pdf bib
基于预训练语言模型的案件要素识别方法(A Method for Case Factor Recognition Based on Pre-trained Language Models)
Haishun Liu (刘海顺) | Lei Wang (王雷) | Yanguang Chen (陈彦光) | Shuchen Zhang (张书晨) | Yuanyuan Sun (孙媛媛) | Hongfei Lin (林鸿飞)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

案件要素识别指将案件描述中重要事实描述自动抽取出来,并根据领域专家设计的要素体系进行分类,是智慧司法领域的重要研究内容。基于传统神经网络的文本编码难以提取深层次特征,基于阈值的多标签分类难以捕获标签间依赖关系,因此本文提出了基于预训练语言模型的多标签文本分类模型。该模型采用以Layer-attentive策略进行特征融合的语言模型作为编码器,使用基于LSTM的序列生成模型作为解码器。在“CAIL2019”数据集上进行实验,该方法比基于循环神经网络的算法在F1值上最高可提升7.6%,在相同超参数设置下比基础语言模型(BERT)提升约3.2%。

pdf bib
Joint Entity and Relation Extraction for Legal Documents with Legal Feature Enhancement
Yanguang Chen | Yuanyuan Sun | Zhihao Yang | Hongfei Lin
Proceedings of the 28th International Conference on Computational Linguistics

In recent years, the plentiful information contained in Chinese legal documents has attracted a great deal of attention because of the large-scale release of the judgment documents on China Judgments Online. It is in great need of enabling machines to understand the semantic information stored in the documents which are transcribed in the form of natural language. The technique of information extraction provides a way of mining the valuable information implied in the unstructured judgment documents. We propose a Legal Triplet Extraction System for drug-related criminal judgment documents. The system extracts the entities and the semantic relations jointly and benefits from the proposed legal lexicon feature and multi-task learning framework. Furthermore, we manually annotate a dataset for Named Entity Recognition and Relation Extraction in Chinese legal domain, which contributes to training supervised triplet extraction models and evaluating the model performance. Our experimental results show that the legal feature introduction and multi-task learning framework are feasible and effective for the Legal Triplet Extraction System. The F1 score of triplet extraction finally reaches 0.836 on the legal dataset.