Global Bootstrapping Neural Network for Entity Set Expansion

Lingyong Yan, Xianpei Han, Ben He, Le Sun


Abstract
Bootstrapping for entity set expansion (ESE) has been studied for a long period, which expands new entities using only a few seed entities as supervision. Recent end-to-end bootstrapping approaches have shown their advantages in information capturing and bootstrapping process modeling. However, due to the sparse supervision problem, previous end-to-end methods often only leverage information from near neighborhoods (local semantics) rather than those propagated from the co-occurrence structure of the whole corpus (global semantics). To address this issue, this paper proposes Global Bootstrapping Network (GBN) with the “pre-training and fine-tuning” strategies for effective learning. Specifically, it contains a global-sighted encoder to capture and encode both local and global semantics into entity embedding, and an attention-guided decoder to sequentially expand new entities based on these embeddings. The experimental results show that the GBN learned by “pre-training and fine-tuning” strategies achieves state-of-the-art performance on two bootstrapping datasets.
Anthology ID:
2020.findings-emnlp.331
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3705–3714
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.331
DOI:
10.18653/v1/2020.findings-emnlp.331
Bibkey:
Cite (ACL):
Lingyong Yan, Xianpei Han, Ben He, and Le Sun. 2020. Global Bootstrapping Neural Network for Entity Set Expansion. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3705–3714, Online. Association for Computational Linguistics.
Cite (Informal):
Global Bootstrapping Neural Network for Entity Set Expansion (Yan et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.331.pdf
Code
 lingyongyan/bootstrapping_pre-train
Data
DocRED