Is ChatGPT the ultimate Data Augmentation Algorithm?

Frédéric Piedboeuf, Philippe Langlais


Abstract
In the aftermath of GPT-3.5, commonly known as ChatGPT, research have attempted to assess its capacity for lowering annotation cost, either by doing zero-shot learning, generating new data, or replacing human annotators. Some studies have also investigated its use for data augmentation (DA), but only in limited contexts, which still leaves the question of how ChatGPT performs compared to state-of-the-art algorithms. In this paper, we use ChatGPT to create new data both with paraphrasing and with zero-shot generation, and compare it to seven other algorithms. We show that while ChatGPT performs exceptionally well on some simpler data, it overall does not perform better than the other algorithms, yet demands a much larger implication from the practitioner due to the ChatGPT often refusing to answer due to sensitive content in the datasets.
Anthology ID:
2023.findings-emnlp.1044
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15606–15615
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.1044
DOI:
10.18653/v1/2023.findings-emnlp.1044
Bibkey:
Cite (ACL):
Frédéric Piedboeuf and Philippe Langlais. 2023. Is ChatGPT the ultimate Data Augmentation Algorithm?. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15606–15615, Singapore. Association for Computational Linguistics.
Cite (Informal):
Is ChatGPT the ultimate Data Augmentation Algorithm? (Piedboeuf & Langlais, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.1044.pdf