Mitigating Over-Generation for Unsupervised Keyphrase Extraction with Heterogeneous Centrality Detection

Mingyang Song, Pengyu Xu, Yi Feng, Huafeng Liu, Liping Jing


Abstract
Over-generation errors occur when a keyphrase extraction model correctly determines a candidate keyphrase as a keyphrase because it contains a word that frequently appears in the document but at the same time erroneously outputs other candidates as keyphrases because they contain the same word. To mitigate this issue, we propose a new heterogeneous centrality detection approach (CentralityRank), which extracts keyphrases by simultaneously identifying both implicit and explicit centrality within a heterogeneous graph as the importance score of each candidate. More specifically, CentralityRank detects centrality by taking full advantage of the content within the input document to construct graphs that encompass semantic nodes of varying granularity levels, not limited to just phrases. These additional nodes act as intermediaries between candidate keyphrases, enhancing cross-phrase relations. Furthermore, we introduce a novel adaptive boundary-aware regularization that can leverage the position information of candidate keyphrases, thus influencing the importance of candidate keyphrases. Extensive experimental results demonstrate the superiority of CentralityRank over recent state-of-the-art unsupervised keyphrase extraction baselines across three benchmark datasets.
Anthology ID:
2023.emnlp-main.1017
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16349–16359
Language:
URL:
https://aclanthology.org/2023.emnlp-main.1017
DOI:
10.18653/v1/2023.emnlp-main.1017
Bibkey:
Cite (ACL):
Mingyang Song, Pengyu Xu, Yi Feng, Huafeng Liu, and Liping Jing. 2023. Mitigating Over-Generation for Unsupervised Keyphrase Extraction with Heterogeneous Centrality Detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16349–16359, Singapore. Association for Computational Linguistics.
Cite (Informal):
Mitigating Over-Generation for Unsupervised Keyphrase Extraction with Heterogeneous Centrality Detection (Song et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.1017.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.1017.mp4