SAMRank: Unsupervised Keyphrase Extraction using Self-Attention Map in BERT and GPT-2

Byungha Kang, Youhyun Shin


Abstract
We propose a novel unsupervised keyphrase extraction approach, called SAMRank, which uses only a self-attention map in a pre-trained language model (PLM) to determine the importance of phrases. Most recent approaches for unsupervised keyphrase extraction mainly utilize contextualized embeddings to capture semantic relevance between words, sentences, and documents. However, due to the anisotropic nature of contextual embeddings, these approaches may not be optimal for semantic similarity measurements. SAMRank as proposed here computes the importance of phrases solely leveraging a self-attention map in a PLM, in this case BERT and GPT-2, eliminating the need to measure embedding similarities. To assess the level of importance, SAMRank combines both global and proportional attention scores through calculations using a self-attention map. We evaluate the SAMRank on three keyphrase extraction datasets: Inspec, SemEval2010, and SemEval2017. The experimental results show that SAMRank outperforms most embedding-based models on both long and short documents and demonstrating that it is possible to use only a self-attention map for keyphrase extraction without relying on embeddings. Source code is available at https://github.com/kangnlp/SAMRank.
Anthology ID:
2023.emnlp-main.630
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10188–10201
Language:
URL:
https://aclanthology.org/2023.emnlp-main.630
DOI:
10.18653/v1/2023.emnlp-main.630
Bibkey:
Cite (ACL):
Byungha Kang and Youhyun Shin. 2023. SAMRank: Unsupervised Keyphrase Extraction using Self-Attention Map in BERT and GPT-2. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10188–10201, Singapore. Association for Computational Linguistics.
Cite (Informal):
SAMRank: Unsupervised Keyphrase Extraction using Self-Attention Map in BERT and GPT-2 (Kang & Shin, EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.630.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.630.mp4