Keyword Extraction Using Unsupervised Learning on the Document’s Adjacency Matrix

Eirini Papagiannopoulou, Grigorios Tsoumakas, Apostolos Papadopoulos


Abstract
This work revisits the information given by the graph-of-words and its typical utilization through graph-based ranking approaches in the context of keyword extraction. Recent, well-known graph-based approaches typically employ the knowledge from word vector representations during the ranking process via popular centrality measures (e.g., PageRank) without giving the primary role to vectors’ distribution. We consider the adjacency matrix that corresponds to the graph-of-words of a target text document as the vector representation of its vocabulary. We propose the distribution-based modeling of this adjacency matrix using unsupervised (learning) algorithms. The efficacy of the distribution-based modeling approaches compared to state-of-the-art graph-based methods is confirmed by an extensive experimental study according to the F1 score. Our code is available on GitHub.
Anthology ID:
2021.textgraphs-1.9
Volume:
Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15)
Month:
June
Year:
2021
Address:
Mexico City, Mexico
Venues:
NAACL | TextGraphs
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
94–105
Language:
URL:
https://aclanthology.org/2021.textgraphs-1.9
DOI:
10.18653/v1/2021.textgraphs-1.9
Bibkey:
Cite (ACL):
Eirini Papagiannopoulou, Grigorios Tsoumakas, and Apostolos Papadopoulos. 2021. Keyword Extraction Using Unsupervised Learning on the Document’s Adjacency Matrix. In Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15), pages 94–105, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Keyword Extraction Using Unsupervised Learning on the Document’s Adjacency Matrix (Papagiannopoulou et al., TextGraphs 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.textgraphs-1.9.pdf