A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing

Xian-Ling Mao; Yi-jing Zhao; Qiang Zhou (周强); Wen-Qing Yuan; Liner Yang (杨麟儿); He-Yan Huang

A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing

Xian-Ling Mao, Yi-Jing Hao, Qiang Zhou, Wen-Qing Yuan, Liner Yang, Heyan Huang

Abstract

Recently, topic modeling has been widely applied in data mining due to its powerful ability. A common, major challenge in applying such topic models to other tasks is to accurately interpret the meaning of each topic. Topic labeling, as a major interpreting method, has attracted significant attention recently. However, most of previous works only focus on the effectiveness of topic labeling, and less attention has been paid to quickly creating good topic descriptors; meanwhile, it’s hard to assign labels for new emerging topics by using most of existing methods. To solve the problems above, in this paper, we propose a novel fast topic labeling framework that casts the labeling problem as a k-nearest neighbor (KNN) search problem in a probability vector set. Our experimental results show that the proposed sequential interleaving method based on locality sensitive hashing (LSH) technology is efficient in boosting the comparison speed among probability distributions, and the proposed framework can generate meaningful labels to interpret topics, including new emerging topics.

Anthology ID:: C16-1315
Volume:: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:: December
Year:: 2016
Address:: Osaka, Japan
Editors:: Yuji Matsumoto, Rashmi Prasad
Venue:: COLING
SIG:
Publisher:: The COLING 2016 Organizing Committee
Note:
Pages:: 3339–3348
Language:
URL:: https://aclanthology.org/C16-1315/
DOI:
Bibkey:
Cite (ACL):: Xian-Ling Mao, Yi-Jing Hao, Qiang Zhou, Wen-Qing Yuan, Liner Yang, and Heyan Huang. 2016. A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3339–3348, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):: A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing (Mao et al., COLING 2016)
Copy Citation:
PDF:: https://aclanthology.org/C16-1315.pdf

PDF Cite Search Fix data