FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

Tingyu Xia, Yue Wang, Yuan Tian, Yi Chang


Abstract
Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords. Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
Anthology ID:
2022.emnlp-main.313
Original:
2022.emnlp-main.313v1
Version 2:
2022.emnlp-main.313v2
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4746–4758
Language:
URL:
https://aclanthology.org/2022.emnlp-main.313
DOI:
10.18653/v1/2022.emnlp-main.313
Bibkey:
Cite (ACL):
Tingyu Xia, Yue Wang, Yuan Tian, and Yi Chang. 2022. FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4746–4758, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification (Xia et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.313.pdf