Open Domain Web Keyphrase Extraction Beyond Language Modeling

Lee Xiong; Chuan Hu; Chenyan Xiong; Daniel Campos; Arnold Overwijk

doi:10.18653/v1/D19-1521

Open Domain Web Keyphrase Extraction Beyond Language Modeling

Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, Arnold Overwijk

Abstract

This paper studies keyphrase extraction in real-world scenarios where documents are from diverse domains and have variant content quality. We curate and release OpenKP, a large scale open domain keyphrase extraction dataset with near one hundred thousand web documents and expert keyphrase annotations. To handle the variations of domain and content quality, we develop BLING-KPE, a neural keyphrase extraction model that goes beyond language understanding using visual presentations of documents and weak supervision from search queries. Experimental results on OpenKP confirm the effectiveness of BLING-KPE and the contributions of its neural architecture, visual features, and search log weak supervision. Zero-shot evaluations on DUC-2001 demonstrate the improved generalization ability of learning from the open domain data compared to a specific domain.

Anthology ID:: D19-1521
Volume:: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:: EMNLP | IJCNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5175–5184
Language:
URL:: https://aclanthology.org/D19-1521/
DOI:: 10.18653/v1/D19-1521
Bibkey:
Cite (ACL):: Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, and Arnold Overwijk. 2019. Open Domain Web Keyphrase Extraction Beyond Language Modeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5175–5184, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: Open Domain Web Keyphrase Extraction Beyond Language Modeling (Xiong et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/D19-1521.pdf
Attachment:: D19-1521.Attachment.zip

PDF Cite Search Attachment Fix data