Pushing the Limits of Low-Resource NER Using LLM Artificial Data Generation

Joan Santoso, Patrick Sutanto, Billy Cahyadi, Esther Setiawan


Abstract
Named Entity Recognition (NER) is an important task, but to achieve great performance, it is usually necessary to collect a large amount of labeled data, incurring high costs. In this paper, we propose using open-source Large Language Models (LLM) to generate NER data with only a few labeled examples, reducing the cost of human annotations. Our proposed method is very simple and can perform well using only a few labeled data points. Experimental results on diverse low-resource NER datasets show that our proposed data generation method can significantly improve the baseline. Additionally, our method can be used to augment datasets with class-imbalance problems and consistently improves model performance on macro-F1 metrics.
Anthology ID:
2024.findings-acl.575
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9652–9667
Language:
URL:
https://aclanthology.org/2024.findings-acl.575
DOI:
Bibkey:
Cite (ACL):
Joan Santoso, Patrick Sutanto, Billy Cahyadi, and Esther Setiawan. 2024. Pushing the Limits of Low-Resource NER Using LLM Artificial Data Generation. In Findings of the Association for Computational Linguistics ACL 2024, pages 9652–9667, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Pushing the Limits of Low-Resource NER Using LLM Artificial Data Generation (Santoso et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.575.pdf