CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients

Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Heuiseok Lim


Abstract
Korean morphological variations present unique opportunities and challenges in natural language processing (NLP), necessitating an advanced understanding of morpheme-based sentence construction. The complexity of morphological variations allows for diverse sentence forms based on the syntactic-semantic integration of functional morphemes (i.e., affixes) to lexical morphemes (i.e., roots). With this in mind, we propose a method - CHEF, replicating the morphological transformations inherent in sentences based on lexical and functional morpheme combinations through generative data augmentation. CHEF operates using a morpheme blender and a label discriminator, thereby enhancing the diversity of Korean sentence forms by capturing the properties of agglutination while maintaining label consistency. We conduct experiments on Korean multiple classification datasets, improving model performance in full- and few-shot settings. Our proposed method boosts performance beyond the preceding data augmentation methods without incurring external data usage. We demonstrate that our approach achieves comparable results yielded by augmentation techniques that use large language models (LLMs).
Anthology ID:
2023.emnlp-main.367
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6014–6029
Language:
URL:
https://aclanthology.org/2023.emnlp-main.367
DOI:
10.18653/v1/2023.emnlp-main.367
Bibkey:
Cite (ACL):
Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, and Heuiseok Lim. 2023. CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6014–6029, Singapore. Association for Computational Linguistics.
Cite (Informal):
CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients (Seo et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.367.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.367.mp4