PubSE: A Hierarchical Model for Publication Extraction from Academic Homepages

Yiqing Zhang, Jianzhong Qi, Rui Zhang, Chuandong Yin


Abstract
Publication information in a researcher’s academic homepage provides insights about the researcher’s expertise, research interests, and collaboration networks. We aim to extract all the publication strings from a given academic homepage. This is a challenging task because the publication strings in different academic homepages may be located at different positions with different structures. To capture the positional and structural diversity, we propose an end-to-end hierarchical model named PubSE based on Bi-LSTM-CRF. We further propose an alternating training method for training the model. Experiments on real data show that PubSE outperforms the state-of-the-art models by up to 11.8% in F1-score.
Anthology ID:
D18-1123
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1005–1010
Language:
URL:
https://aclanthology.org/D18-1123
DOI:
10.18653/v1/D18-1123
Bibkey:
Cite (ACL):
Yiqing Zhang, Jianzhong Qi, Rui Zhang, and Chuandong Yin. 2018. PubSE: A Hierarchical Model for Publication Extraction from Academic Homepages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1005–1010, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
PubSE: A Hierarchical Model for Publication Extraction from Academic Homepages (Zhang et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1123.pdf
Attachment:
 D18-1123.Attachment.pdf