EdgeGraph: Revisiting Statistical Measures for Language Independent Keyphrase Extraction Leveraging on Bi-grams
Muskan Garg | Amit Gupta
Proceedings of the 19th International Conference on Natural Language Processing (ICON)

The NLP research community resort conventional Word Co-occurrence Network (WCN) for keyphrase extraction using random walk sampling mechanism such as PageRank algo rithm to identify candidate words/ phrases. We argue that the nature of WCN is a path-based network and does not follow a core-periphery structure as observed in web-page linking network. Thus, the language networks leveraging on bi-grams may represent better semantics for keyphrase extraction using random walk. In this work, we use bi-gram as a node and adjacent bi-grams are linked together to generate an EdgeGraph. We validate our method over four publicly available dataset to demonstrate the effectiveness of our simple yet effective language network and our extensive experiments show that random walk over EdgeGraph representation performs better than conventional WCN. We make our codes and supplementary materials available over Github.


Revisiting Taxonomy Induction over Wikipedia
Amit Gupta | Francesco Piccinno | Mikhail Kozhevnikov | Marius Paşca | Daniele Pighin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Guided by multiple heuristics, a unified taxonomy of entities and categories is distilled from the Wikipedia category network. A comprehensive evaluation, based on the analysis of upward generalization paths, demonstrates that the taxonomy supports generalizations which are more than twice as accurate as the state of the art. The taxonomy is available at


Multiple Document Summarization Using Principal Component Analysis Incorporating Semantic Vector Space Model
Om Vikas | Akhil K Meshram | Girraj Meena | Amit Gupta
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 2, June 2008