EdgeGraph: Revisiting Statistical Measures for Language Independent Keyphrase Extraction Leveraging on Bi-grams

Muskan Garg, Amit Gupta


Abstract
The NLP research community resort conventional Word Co-occurrence Network (WCN) for keyphrase extraction using random walk sampling mechanism such as PageRank algo rithm to identify candidate words/ phrases. We argue that the nature of WCN is a path-based network and does not follow a core-periphery structure as observed in web-page linking network. Thus, the language networks leveraging on bi-grams may represent better semantics for keyphrase extraction using random walk. In this work, we use bi-gram as a node and adjacent bi-grams are linked together to generate an EdgeGraph. We validate our method over four publicly available dataset to demonstrate the effectiveness of our simple yet effective language network and our extensive experiments show that random walk over EdgeGraph representation performs better than conventional WCN. We make our codes and supplementary materials available over Github.
Anthology ID:
2022.icon-main.1
Volume:
Proceedings of the 19th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2022
Address:
New Delhi, India
Editors:
Md. Shad Akhtar, Tanmoy Chakraborty
Venue:
ICON
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–10
Language:
URL:
https://aclanthology.org/2022.icon-main.1
DOI:
Bibkey:
Cite (ACL):
Muskan Garg and Amit Gupta. 2022. EdgeGraph: Revisiting Statistical Measures for Language Independent Keyphrase Extraction Leveraging on Bi-grams. In Proceedings of the 19th International Conference on Natural Language Processing (ICON), pages 1–10, New Delhi, India. Association for Computational Linguistics.
Cite (Informal):
EdgeGraph: Revisiting Statistical Measures for Language Independent Keyphrase Extraction Leveraging on Bi-grams (Garg & Gupta, ICON 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.icon-main.1.pdf