Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models

Pierre Colombo, Victor Pellegrain, Malik Boudiaf, Myriam Tami, Victor Storchan, Ismail Ayed, Pablo Piantanida


Abstract
Proprietary and closed APIs are becoming increasingly common to process natural language, and are impacting the practical applications of natural language processing, including few-shot classification. Few-shot classification involves training a model to perform a new classification task with a handful of labeled data. This paper presents three contributions. First, we introduce a scenario where the embedding of a pre-trained model is served through a gated API with compute-cost and data-privacy constraints. Second, we propose a transductive inference, a learning paradigm that has been overlooked by the NLP community. Transductive inference, unlike traditional inductive learning, leverages the statistics of unlabelled data. We also introduce a new parameter-free transductive regularizer based on the Fisher-Rao loss, which can be used on top of the gated API embeddings. This method fully utilizes unlabelled data, does not share any label with the third-party API provider and could serve as a baseline for future research. Third, we propose an improved experimental setting and compile a benchmark of eight datasets involving multiclass classification in four different languages, with up to 151 classes. We evaluate our methods using eight backbone models, along with an episodic evaluation over 1,000 episodes, which demonstrate the superiority of transductive inference over the standard inductive setting.
Anthology ID:
2023.emnlp-main.257
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4214–4231
Language:
URL:
https://aclanthology.org/2023.emnlp-main.257
DOI:
10.18653/v1/2023.emnlp-main.257
Bibkey:
Cite (ACL):
Pierre Colombo, Victor Pellegrain, Malik Boudiaf, Myriam Tami, Victor Storchan, Ismail Ayed, and Pablo Piantanida. 2023. Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4214–4231, Singapore. Association for Computational Linguistics.
Cite (Informal):
Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models (Colombo et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.257.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.257.mp4