Yingyi Zhang
2019
Using Human Attention to Extract Keyphrase from Microblog Post
Yingyi Zhang
|
Chengzhi Zhang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
This paper studies automatic keyphrase extraction on social media. Previous works have achieved promising results on it, but they neglect human reading behavior during keyphrase annotating. The human attention is a crucial element of human reading behavior. It reveals the relevance of words to the main topics of the target text. Thus, this paper aims to integrate human attention into keyphrase extraction models. First, human attention is represented by the reading duration estimated from eye-tracking corpus. Then, we merge human attention with neural network models by an attention mechanism. In addition, we also integrate human attention into unsupervised models. To the best of our knowledge, we are the first to utilize human attention on keyphrase extraction tasks. The experimental results show that our models have significant improvements on two Twitter datasets.
2018
Encoding Conversation Context for Neural Keyphrase Extraction from Microblog Posts
Yingyi Zhang
|
Jing Li
|
Yan Song
|
Chengzhi Zhang
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Existing keyphrase extraction methods suffer from data sparsity problem when they are conducted on short and informal texts, especially microblog messages. Enriching context is one way to alleviate this problem. Considering that conversations are formed by reposting and replying messages, they provide useful clues for recognizing essential content in target posts and are therefore helpful for keyphrase identification. In this paper, we present a neural keyphrase extraction framework for microblog posts that takes their conversation context into account, where four types of neural encoders, namely, averaged embedding, RNN, attention, and memory networks, are proposed to represent the conversation context. Experimental results on Twitter and Weibo datasets show that our framework with such encoders outperforms state-of-the-art approaches.