Extending Neural Keyword Extraction with TF-IDF tagset matching

Boshko Koloski; Senja Pollak; Blaž Škrlj; Matej Martinc

Extending Neural Keyword Extraction with TF-IDF tagset matching

Boshko Koloski, Senja Pollak, Blaž Škrlj, Matej Martinc

Abstract

Keyword extraction is the task of identifying words (or multi-word expressions) that best describe a given document and serve in news portals to link articles of similar topics. In this work, we develop and evaluate our methods on four novel data sets covering less-represented, morphologically-rich languages in European news media industry (Croatian, Estonian, Latvian, and Russian). First, we perform evaluation of two supervised neural transformer-based methods, Transformer-based Neural Tagger for Keyword Identification (TNT-KID) and Bidirectional Encoder Representations from Transformers (BERT) with an additional Bidirectional Long Short-Term Memory Conditional Random Fields (BiLSTM CRF) classification head, and compare them to a baseline Term Frequency - Inverse Document Frequency (TF-IDF) based unsupervised approach. Next, we show that by combining the keywords retrieved by both neural transformer-based methods and extending the final set of keywords with an unsupervised TF-IDF based technique, we can drastically improve the recall of the system, making it appropriate for usage as a recommendation system in the media house environment.

Anthology ID:: 2021.hackashop-1.4
Volume:: Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
Month:: April
Year:: 2021
Address:: Online
Editors:: Hannu Toivonen, Michele Boggia
Venue:: Hackashop
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22–29
Language:
URL:: https://aclanthology.org/2021.hackashop-1.4/
DOI:
Bibkey:
Cite (ACL):: Boshko Koloski, Senja Pollak, Blaž Škrlj, and Matej Martinc. 2021. Extending Neural Keyword Extraction with TF-IDF tagset matching. In Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation, pages 22–29, Online. Association for Computational Linguistics.
Cite (Informal):: Extending Neural Keyword Extraction with TF-IDF tagset matching (Koloski et al., Hackashop 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.hackashop-1.4.pdf

PDF Cite Search Fix data