Entity Disambiguation on a Tight Labeling Budget

Audi Primadhanty, Ariadna Quattoni


Abstract
Many real-world NLP applications face the challenge of training an entity disambiguation model for a specific domain with a small labeling budget. In this setting there is often access to a large unlabeled pool of documents. It is then natural to ask the question: which samples should be selected for annotation? In this paper we propose a solution that combines feature diversity with low rank correction. Our sampling strategy is formulated in the context of bilinear tensor models. Our experiments show that the proposed approach can significantly reduce the amount of labeled data necessary to achieve a given performance.
Anthology ID:
2023.findings-emnlp.479
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7208–7215
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.479
DOI:
10.18653/v1/2023.findings-emnlp.479
Bibkey:
Cite (ACL):
Audi Primadhanty and Ariadna Quattoni. 2023. Entity Disambiguation on a Tight Labeling Budget. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7208–7215, Singapore. Association for Computational Linguistics.
Cite (Informal):
Entity Disambiguation on a Tight Labeling Budget (Primadhanty & Quattoni, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.479.pdf