Tianze Luo
2024
Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges
Bosheng Ding
|
Chengwei Qin
|
Ruochen Zhao
|
Tianze Luo
|
Xinze Li
|
Guizhen Chen
|
Wenhan Xia
|
Junjie Hu
|
Anh Tuan Luu
|
Shafiq Joty
Findings of the Association for Computational Linguistics: ACL 2024
In the rapidly evolving field of large language models (LLMs), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This survey explores the transformative impact of LLMs on DA, particularly addressing the unique challenges and opportunities they present in the context of natural language processing (NLP) and beyond. From both data and learning perspectives, we examine various strategies that utilize LLMs for data augmentation, including a novel exploration of learning paradigms where LLM-generated data is used for diverse forms of further training. Additionally, this paper highlights the primary open challenges faced in this domain, ranging from controllable data augmentation to multi-modal data augmentation. This survey highlights a paradigm shift introduced by LLMs in DA, and aims to serve as a comprehensive guide for researchers and practitioners.
2022
Domain Confused Contrastive Learning for Unsupervised Domain Adaptation
Quanyu Long
|
Tianze Luo
|
Wenya Wang
|
Sinno Pan
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
In this work, we study Unsupervised Domain Adaptation (UDA) in a challenging self-supervised approach. One of the difficulties is how to learn task discrimination in the absence of target labels. Unlike previous literature which directly aligns cross-domain distributions or leverages reverse gradient, we propose Domain Confused Contrastive Learning (DCCL), which can bridge the source and target domains via domain puzzles, and retain discriminative representations after adaptation. Technically, DCCL searches for a most domain-challenging direction and exquisitely crafts domain confused augmentations as positive pairs, then it contrastively encourages the model to pull representations towards the other domain, thus learning more stable and effective domain invariances. We also investigate whether contrastive learning necessarily helps with UDA when performing other data augmentations. Extensive experiments demonstrate that DCCL significantly outperforms baselines, further ablation study and analysis also show the effectiveness and availability of DCCL.
Search
Co-authors
- Quanyu Long 1
- Wenya Wang 1
- Sinno Pan 1
- Bosheng Ding 1
- Chengwei Qin 1
- show all...