Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks

Trapit Bansal; Rishikesh Jha; Andrew Mccallum

doi:10.18653/v1/2020.coling-main.448

Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks

Trapit Bansal, Rishikesh Jha, Andrew McCallum

Abstract

Pre-trained transformer models have shown enormous success in improving performance on several downstream tasks. However, fine-tuning on a new task still requires large amounts of task-specific labeled data to achieve good performance. We consider this problem of learning to generalize to new tasks, with a few examples, as a meta-learning problem. While meta-learning has shown tremendous progress in recent years, its application is still limited to simulated problems or problems with limited diversity across tasks. We develop a novel method, LEOPARD, which enables optimization-based meta-learning across tasks with a different number of classes, and evaluate different methods on generalization to diverse NLP classification tasks. LEOPARD is trained with the state-of-the-art transformer architecture and shows better generalization to tasks not seen at all during training, with as few as 4 examples per label. Across 17 NLP tasks, including diverse domains of entity typing, natural language inference, sentiment analysis, and several other text classification tasks, we show that LEOPARD learns better initial parameters for few-shot learning than self-supervised pre-training or multi-task training, outperforming many strong baselines, for example, yielding 14.6% average relative gain in accuracy on unseen tasks with only 4 examples per label.

Anthology ID:: 2020.coling-main.448
Volume:: Proceedings of the 28th International Conference on Computational Linguistics
Month:: December
Year:: 2020
Address:: Barcelona, Spain (Online)
Editors:: Donia Scott, Nuria Bel, Chengqing Zong
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 5108–5123
Language:
URL:: https://aclanthology.org/2020.coling-main.448
DOI:: 10.18653/v1/2020.coling-main.448
Bibkey:
Cite (ACL):: Trapit Bansal, Rishikesh Jha, and Andrew McCallum. 2020. Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5108–5123, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):: Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks (Bansal et al., COLING 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.coling-main.448.pdf
Code: iesl/leopard + additional community code
Data: GLUE, SST

PDF Cite Search Code