Simple Questions Generate Named Entity Recognition Datasets

Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang


Abstract
Recent named entity recognition (NER) models often rely on human-annotated datasets requiring the vast engagement of professional knowledge on the target domain and entities. This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions to an open-domain question answering system (e.g., “Which disease?”). Despite using fewer training resources, our models solely trained on the generated datasets largely outperform strong low-resource models by 19.5 F1 score across six popular NER benchmarks. Our models also show competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by 5.2 F1 score on three benchmarks and achieve new state-of-the-art performance.
Anthology ID:
2022.emnlp-main.417
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6220–6236
Language:
URL:
https://aclanthology.org/2022.emnlp-main.417
DOI:
10.18653/v1/2022.emnlp-main.417
Bibkey:
Cite (ACL):
Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, and Jaewoo Kang. 2022. Simple Questions Generate Named Entity Recognition Datasets. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6220–6236, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Simple Questions Generate Named Entity Recognition Datasets (Kim et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.417.pdf