César Parra-Rojas


2023

pdf bib
A weakly supervised textual entailment approach to zero-shot text classification
Marc Pàmies | Joan Llop | Francesco Multari | Nicolau Duran-Silva | César Parra-Rojas | Aitor Gonzalez-Agirre | Francesco Alessandro Massucci | Marta Villegas
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Zero-shot text classification is a widely studied task that deals with a lack of annotated data. The most common approach is to reformulate it as a textual entailment problem, enabling classification into unseen classes. This work explores an effective approach that trains on a weakly supervised dataset generated from traditional classification data. We empirically study the relation between the performance of the entailment task, which is used as a proxy, and the target zero-shot text classification task. Our findings reveal that there is no linear correlation between both tasks, to the extent that it can be detrimental to lengthen the fine-tuning process even when the model is still learning, and propose a straightforward method to stop training on time. As a proof of concept, we introduce a domain-specific zero-shot text classifier that was trained on Microsoft Academic Graph data. The model, called SCIroShot, achieves state-of-the-art performance in the scientific domain and competitive results in other areas. Both the model and evaluation benchmark are publicly available on HuggingFace and GitHub.