一种基于IDLSTM+CRF的中文主地域抽取方法(A Chinese Main Location Extraction Method based on IDLSTM+CRF)

Yiqi Tong (童逸琦), Peigen Ye (叶培根), Biao Fu (付彪), Yidong Chen (陈毅东), Xiaodong Shi (史晓东)


Abstract
新闻文本通常会涉及多个地域,主地域则描述了文本舆情内容的地域属性,是进行舆情分析的关键属性。目前深度学习领域针对主地域自动抽取的研究还比较少。基于此,本文构建了一个基于IDLSTM+CRF的主地域抽取系统。该系统通过地名识别、主地域抽取、主地域补全三大模块实现对主地域标签的自动抽取和补全。在公开数据集上的实验结果表明,我们的方法在地名识别任务上要优于BiLSTM+CRF等模型。而对于主地域抽取任务,目前还没有标准的中文主地域评测集合。针对该问题,我们标注并开源了1226条验证集和1500条测试集。最终,我们的主地域抽取系统在两个集合上分别取得了91.7%和84.8%的抽取准确率,并成功运用于线上生产环境。
Anthology ID:
2021.ccl-1.71
Volume:
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:
August
Year:
2021
Address:
Huhhot, China
Editors:
Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
792–802
Language:
Chinese
URL:
https://aclanthology.org/2021.ccl-1.71
DOI:
Bibkey:
Cite (ACL):
Yiqi Tong, Peigen Ye, Biao Fu, Yidong Chen, and Xiaodong Shi. 2021. 一种基于IDLSTM+CRF的中文主地域抽取方法(A Chinese Main Location Extraction Method based on IDLSTM+CRF). In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 792–802, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):
一种基于IDLSTM+CRF的中文主地域抽取方法(A Chinese Main Location Extraction Method based on IDLSTM+CRF) (Tong et al., CCL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ccl-1.71.pdf
Data
SQuAD