Enhancing Task-Specific Distillation in Small Data Regimes through Language Generation

Husam Quteineh, Spyridon Samothrakis, Richard Sutcliffe


Abstract
Large-scale pretrained language models have led to significant improvements in Natural Language Processing. Unfortunately, they come at the cost of high computational and storage requirements that complicate their deployment on low-resource devices. This issue can be addressed by distilling knowledge from larger models to smaller ones through pseudo-labels on task-specific datasets. However, this can be difficult for tasks with very limited data. To overcome this challenge, we present a novel approach where knowledge can be distilled from a teacher model to a student model through the generation of synthetic data. For this to be done, we first fine-tune the teacher and student models, as well as a Natural Language Generation (NLG) model, on the target task dataset. We then let both student and teacher work together to condition the NLG model to generate examples that can enhance the performance of the student. We tested our approach on two data generation methods: a) Targeted generation using the Monte Carlo Tree Search (MCTS) algorithm, and b) A Non-Targeted Text Generation (NTTG) method. We evaluate the effectiveness of our approaches against a baseline that uses the BERT model for data augmentation through random word replacement. By testing this approach on the SST-2, MRPC, YELP-2, DBpedia, and TREC-6 datasets, we consistently witnessed considerable improvements over the word-replacement baseline.
Anthology ID:
2022.coling-1.520
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5955–5965
Language:
URL:
https://aclanthology.org/2022.coling-1.520
DOI:
Bibkey:
Cite (ACL):
Husam Quteineh, Spyridon Samothrakis, and Richard Sutcliffe. 2022. Enhancing Task-Specific Distillation in Small Data Regimes through Language Generation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5955–5965, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Enhancing Task-Specific Distillation in Small Data Regimes through Language Generation (Quteineh et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.520.pdf
Data
GLUEMRPCSST