Salience Rank: Efficient Keyphrase Extraction with Topic Modeling

Nedelina Teneva, Weiwei Cheng


Abstract
Topical PageRank (TPR) uses latent topic distribution inferred by Latent Dirichlet Allocation (LDA) to perform ranking of noun phrases extracted from documents. The ranking procedure consists of running PageRank K times, where K is the number of topics used in the LDA model. In this paper, we propose a modification of TPR, called Salience Rank. Salience Rank only needs to run PageRank once and extracts comparable or better keyphrases on benchmark datasets. In addition to quality and efficiency benefit, our method has the flexibility to extract keyphrases with varying tradeoffs between topic specificity and corpus specificity.
Anthology ID:
P17-2084
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
530–535
Language:
URL:
https://aclanthology.org/P17-2084
DOI:
10.18653/v1/P17-2084
Bibkey:
Cite (ACL):
Nedelina Teneva and Weiwei Cheng. 2017. Salience Rank: Efficient Keyphrase Extraction with Topic Modeling. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 530–535, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Salience Rank: Efficient Keyphrase Extraction with Topic Modeling (Teneva & Cheng, ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-2084.pdf