Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction

Yansen Wang, Zhen Fan, Carolyn Rose


Abstract
Open-domain Keyphrase extraction (KPE) on the Web is a fundamental yet complex NLP task with a wide range of practical applications within the field of Information Retrieval. In contrast to other document types, web page designs are intended for easy navigation and information finding. Effective designs encode within the layout and formatting signals that point to where the important information can be found. In this work, we propose a modeling approach that leverages these multi-modal signals to aid in the KPE task. In particular, we leverage both lexical and visual features (e.g., size, font, position) at the micro-level to enable effective strategy induction and meta-level features that describe pages at a macro-level to aid in strategy selection. Our evaluation demonstrates that a combination of effective strategy induction and strategy selection within this approach for the KPE task outperforms state-of-the-art models. A qualitative post-hoc analysis illustrates how these features function within the model.
Anthology ID:
2020.emnlp-main.140
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1790–1800
Language:
URL:
https://aclanthology.org/2020.emnlp-main.140
DOI:
10.18653/v1/2020.emnlp-main.140
Bibkey:
Cite (ACL):
Yansen Wang, Zhen Fan, and Carolyn Rose. 2020. Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1790–1800, Online. Association for Computational Linguistics.
Cite (Informal):
Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction (Wang et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.140.pdf
Optional supplementary material:
 2020.emnlp-main.140.OptionalSupplementaryMaterial.zip
Video:
 https://slideslive.com/38938750