基于半监督学习的中文社交文本事件聚类方法(Semi-supervised Method to Cluster Chinese Events on Social Streams)

Hengrui Guo (郭恒睿), Zhongqing Wang (王中卿), Peifeng Li (李培峰), Qiaoming Zhu (朱巧明)


Abstract
面向社交媒体的事件聚类旨在根据事件特征对短文本聚类。目前,事件聚类模型主要分为无监督模型和有监督模型。无监督模型聚类效果较差,有监督模型依赖大量标注数据。基于此,本文提出了一种半监督事件聚类模型(SemiEC),该模型在小规模标注数据的基础上,利用LSTM表征事件,利用线性模型计算文本相似度,进行增量聚类,利用增量聚类产生的标注数据对模型再训练,结束后对不确定样本再聚类。实验表明,SemiEC的性能相比其他模型均有所提高。
Anthology ID:
2020.ccl-1.59
Volume:
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Month:
October
Year:
2020
Address:
Haikou, China
Editors:
Maosong Sun (孙茂松), Sujian Li (李素建), Yue Zhang (张岳), Yang Liu (刘洋)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
634–644
Language:
Chinese
URL:
https://aclanthology.org/2020.ccl-1.59
DOI:
Bibkey:
Cite (ACL):
Hengrui Guo, Zhongqing Wang, Peifeng Li, and Qiaoming Zhu. 2020. 基于半监督学习的中文社交文本事件聚类方法(Semi-supervised Method to Cluster Chinese Events on Social Streams). In Proceedings of the 19th Chinese National Conference on Computational Linguistics, pages 634–644, Haikou, China. Chinese Information Processing Society of China.
Cite (Informal):
基于半监督学习的中文社交文本事件聚类方法(Semi-supervised Method to Cluster Chinese Events on Social Streams) (Guo et al., CCL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.ccl-1.59.pdf