Joint Bootstrapping Machines for High Confidence Relation Extraction

Pankaj Gupta, Benjamin Roth, Hinrich Schütze


Abstract
Semi-supervised bootstrapping techniques for relationship extraction from text iteratively expand a set of initial seed instances. Due to the lack of labeled data, a key challenge in bootstrapping is semantic drift: if a false positive instance is added during an iteration, then all following iterations are contaminated. We introduce BREX, a new bootstrapping method that protects against such contamination by highly effective confidence assessment. This is achieved by using entity and template seeds jointly (as opposed to just one as in previous work), by expanding entities and templates in parallel and in a mutually constraining fashion in each iteration and by introducing higherquality similarity measures for templates. Experimental results show that BREX achieves an F1 that is 0.13 (0.87 vs. 0.74) better than the state of the art for four relationships.
Anthology ID:
N18-1003
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26–36
Language:
URL:
https://aclanthology.org/N18-1003
DOI:
10.18653/v1/N18-1003
Bibkey:
Cite (ACL):
Pankaj Gupta, Benjamin Roth, and Hinrich Schütze. 2018. Joint Bootstrapping Machines for High Confidence Relation Extraction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 26–36, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Joint Bootstrapping Machines for High Confidence Relation Extraction (Gupta et al., NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-1003.pdf
Video:
 https://aclanthology.org/N18-1003.mp4
Code
 pgcool/Joint-Bootstrapping-Machines