古汉语嵌套命名实体识别数据集的构建和应用研究(Construction and application of classical Chinese nested named entity recognition data set)

Zhiqiang Xie (谢志强), Jinzhu Liu (刘金柱), Genhui Liu (刘根辉)


Abstract
“本文聚焦研究较少的古汉语嵌套命名实体识别任务,以《史记》作为原始语料,针对古文意义丰富而导致的实体分类模糊问题,分别构建了基于字词本义和语境义2个标注标准的古汉语嵌套命名实体数据集,探讨了数据集的实体分类原则和标注格式,并用RoBERTa-classical-chinese+GlobalPointer模型进行对比试验,标准一数据集F1值为80.42%,标准二F1值为77.43%,以此确定了数据集的标注标准。之后对比了六种预训练模型配合GlobalPointer在古汉语嵌套命名实体识别任务上的表现。最终试验结果:RoBERTa-classical-chinese模型F1值为84.71%,表现最好。”
Anthology ID:
2022.ccl-1.37
Volume:
Proceedings of the 21st Chinese National Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Nanchang, China
Editors:
Maosong Sun (孙茂松), Yang Liu (刘洋), Wanxiang Che (车万翔), Yang Feng (冯洋), Xipeng Qiu (邱锡鹏), Gaoqi Rao (饶高琦), Yubo Chen (陈玉博)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
406–416
Language:
Chinese
URL:
https://aclanthology.org/2022.ccl-1.37
DOI:
Bibkey:
Cite (ACL):
Zhiqiang Xie, Jinzhu Liu, and Genhui Liu. 2022. 古汉语嵌套命名实体识别数据集的构建和应用研究(Construction and application of classical Chinese nested named entity recognition data set). In Proceedings of the 21st Chinese National Conference on Computational Linguistics, pages 406–416, Nanchang, China. Chinese Information Processing Society of China.
Cite (Informal):
古汉语嵌套命名实体识别数据集的构建和应用研究(Construction and application of classical Chinese nested named entity recognition data set) (Xie et al., CCL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.ccl-1.37.pdf