Krutarth Patel
2021
Exploiting Position and Contextual Word Embeddings for Keyphrase Extraction from Scientific Papers
Krutarth Patel
|
Cornelia Caragea
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Keyphrases associated with research papers provide an effective way to find useful information in the large and growing scholarly digital collections. In this paper, we present KPRank, an unsupervised graph-based algorithm for keyphrase extraction that exploits both positional information and contextual word embeddings into a biased PageRank. Our experimental results on five benchmark datasets show that KPRank that uses contextual word embeddings with additional position signal outperforms previous approaches and strong baselines for this task.
2020
On the Use of Web Search to Improve Scientific Collections
Krutarth Patel
|
Cornelia Caragea
|
Sujatha Das Gollapalli
Proceedings of the First Workshop on Scholarly Document Processing
Despite the advancements in search engine features, ranking methods, technologies, and the availability of programmable APIs, current-day open-access digital libraries still rely on crawl-based approaches for acquiring their underlying document collections. In this paper, we propose a novel search-driven framework for acquiring documents for such scientific portals. Within our framework, publicly-available research paper titles and author names are used as queries to a Web search engine. We were able to obtain ~267,000 unique research papers through our fully-automated framework using ~76,000 queries, resulting in almost 200,000 more papers than the number of queries. Moreover, through a combination of title and author name search, we were able to recover 78% of the original searched titles.
Dynamic Classification in Web Archiving Collections
Krutarth Patel
|
Cornelia Caragea
|
Mark Phillips
Proceedings of the Twelfth Language Resources and Evaluation Conference
The Web archived data usually contains high-quality documents that are very useful for creating specialized collections of documents. To create such collections, there is a substantial need for automatic approaches that can distinguish the documents of interest for a collection out of the large collections (of millions in size) from Web Archiving institutions. However, the patterns of the documents of interest can differ substantially from one document to another, which makes the automatic classification task very challenging. In this paper, we explore dynamic fusion models to find, on the fly, the model or combination of models that performs best on a variety of document types. Our experimental results show that the approach that fuses different models outperforms individual models and other ensemble methods on three datasets.
Search