Manvith Reddy
2020
N-Grams TextRank A Novel Domain Keyword Extraction Technique
Saransh Rajput
|
Akshat Gahoi
|
Manvith Reddy
|
Dipti Mishra Sharma
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TermTraction 2020 Shared Task
The rapid growth of the internet has given us a wealth of information and data spread across the web. However, as the data begins to grow we simultaneously face the grave problem of an Information Explosion. An abundance of data can lead to large scale data management problems as well as the loss of the true meaning of the data. In this paper, we present an advanced domain specific keyword extraction algorithm in order to tackle this problem of paramount importance. Our algorithm is based on a modified version of TextRank algorithm - an algorithm based on PageRank to successfully determine the keywords from a domain specific document. Furthermore, this paper proposes a modification to the traditional TextRank algorithm that takes into account bigrams and trigrams and returns results with an extremely high precision. We observe how the precision and f1-score of this model outperforms other models in many domains and the recall can be easily increased by increasing the number of results without affecting the precision. We also discuss about the future work of extending the same algorithm to Indian languages.