A Practical Incremental Learning Framework For Sparse Entity Extraction

Hussein Al-Olimat, Steven Gustafson, Jason Mackay, Krishnaprasad Thirunarayan, Amit Sheth


Abstract
This work addresses challenges arising from extracting entities from textual data, including the high cost of data annotation, model accuracy, selecting appropriate evaluation criteria, and the overall quality of annotation. We present a framework that integrates Entity Set Expansion (ESE) and Active Learning (AL) to reduce the annotation cost of sparse data and provide an online evaluation method as feedback. This incremental and interactive learning framework allows for rapid annotation and subsequent extraction of sparse data while maintaining high accuracy. We evaluate our framework on three publicly available datasets and show that it drastically reduces the cost of sparse entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores respectively. Moreover, the method exhibited robust performance across all datasets.
Anthology ID:
C18-1059
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
700–710
Language:
URL:
https://aclanthology.org/C18-1059
DOI:
Bibkey:
Cite (ACL):
Hussein Al-Olimat, Steven Gustafson, Jason Mackay, Krishnaprasad Thirunarayan, and Amit Sheth. 2018. A Practical Incremental Learning Framework For Sparse Entity Extraction. In Proceedings of the 27th International Conference on Computational Linguistics, pages 700–710, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
A Practical Incremental Learning Framework For Sparse Entity Extraction (Al-Olimat et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1059.pdf
Code
 halolimat/SpExtor
Data
CoNLL-2003