Rui Chao


2021

pdf bib
融入篇章信息的文学作品命名实体识别(Document-level Literary Named Entity Recognition)
Yuxiang Jia (贾玉祥) | Rui Chao (晁睿) | Hongying Zan (昝红英) | Huayi Dou (窦华溢) | Shuai Cao (曹帅) | Shuo Xu (徐硕)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

命名实体识别是文学作品智能分析的基础性工作,当前文学领域命名实体识别的研究还较薄弱,一个主要的原因是缺乏标注语料。本文从金庸小说入手,对两部小说180余万字进行了命名实体的标注,共标注4类实体5万多个。针对小说文本的特点,本文提出融入篇章信息的命名实体识别模型,引入篇章字典保存汉字的历史状态,利用可信度计算融合BiGRU-CRF与Transformer模型。实验结果表明,利用篇章信息有效地提升了命名实体识别的效果。最后,我们还探讨了命名实体识别在小说社会网络构建中的应用。

2020

pdf bib
Chinese Grammatical Error Diagnosis Based on RoBERTa-BiLSTM-CRF Model
Yingjie Han | Yingjie Yan | Yangchao Han | Rui Chao | Hongying Zan
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

Chinese Grammatical Error Diagnosis (CGED) is a natural language processing task for the NLPTEA6 workshop. The goal of this task is to automatically diagnose grammatical errors in Chinese sentences written by L2 learners. This paper proposes a RoBERTa-BiLSTM-CRF model to detect grammatical errors in sentences. Firstly, RoBERTa model is used to obtain word vectors. Secondly, word vectors are input into BiLSTM layer to learn context features. Last, CRF layer without hand-craft features work for processing the output by BiLSTM. The optimal global sequences are obtained according to state transition matrix of CRF and adjacent labels of training data. In experiments, the result of RoBERTa-CRF model and ERNIE-BiLSTM-CRF model are compared, and the impacts of parameters of the models and the testing datasets are analyzed. In terms of evaluation results, our recall score of RoBERTa-BiLSTM-CRF ranks fourth at the detection level.