Want To Reduce Labeling Cost? GPT-3 Can Help

Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng


Abstract
Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often task-specific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 170 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance. These results present a cost-effective data labeling methodology that is generalizable to many practical applications.
Anthology ID:
2021.findings-emnlp.354
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venues:
EMNLP | Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4195–4205
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.354
DOI:
10.18653/v1/2021.findings-emnlp.354
Bibkey:
Cite (ACL):
Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, and Michael Zeng. 2021. Want To Reduce Labeling Cost? GPT-3 Can Help. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4195–4205, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Want To Reduce Labeling Cost? GPT-3 Can Help (Wang et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.354.pdf
Data
AG NewsSQuADSST