Patrick Sutanto


2024

pdf bib
Pushing the Limits of Low-Resource NER Using LLM Artificial Data Generation
Joan Santoso | Patrick Sutanto | Billy Cahyadi | Esther Setiawan
Findings of the Association for Computational Linguistics ACL 2024

Named Entity Recognition (NER) is an important task, but to achieve great performance, it is usually necessary to collect a large amount of labeled data, incurring high costs. In this paper, we propose using open-source Large Language Models (LLM) to generate NER data with only a few labeled examples, reducing the cost of human annotations. Our proposed method is very simple and can perform well using only a few labeled data points. Experimental results on diverse low-resource NER datasets show that our proposed data generation method can significantly improve the baseline. Additionally, our method can be used to augment datasets with class-imbalance problems and consistently improves model performance on macro-F1 metrics.