Keyphrase Extraction with Incomplete Annotated Training Data

Yanfei Lei, Chunming Hu, Guanghui Ma, Richong Zhang


Abstract
Extracting keyphrases that summarize the main points of a document is a fundamental task in natural language processing. Supervised approaches to keyphrase extraction(KPE) are largely developed based on the assumption that the training data is fully annotated. However, due to the difficulty of keyphrase annotating, KPE models severely suffer from incomplete annotated problem in many scenarios. To this end, we propose a more robust training method that learns to mitigate the misguidance brought by unlabeled keyphrases. We introduce negative sampling to adjust training loss, and conduct experiments under different scenarios. Empirical studies on synthetic datasets and open domain dataset show that our model is robust to incomplete annotated problem and surpasses prior baselines. Extensive experiments on five scientific domain datasets of different scales demonstrate that our model is competitive with the state-of-the-art method.
Anthology ID:
2021.wnut-1.4
Volume:
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Month:
November
Year:
2021
Address:
Online
Venues:
EMNLP | WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26–34
Language:
URL:
https://aclanthology.org/2021.wnut-1.4
DOI:
10.18653/v1/2021.wnut-1.4
Bibkey:
Cite (ACL):
Yanfei Lei, Chunming Hu, Guanghui Ma, and Richong Zhang. 2021. Keyphrase Extraction with Incomplete Annotated Training Data. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 26–34, Online. Association for Computational Linguistics.
Cite (Informal):
Keyphrase Extraction with Incomplete Annotated Training Data (Lei et al., WNUT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wnut-1.4.pdf
Software:
 2021.wnut-1.4.Software.zip
Data
KP20k