Small-Text: Active Learning for Text Classification in Python

Christopher Schröder, Lydia Müller, Andreas Niekler, Martin Potthast


Abstract
We introduce small-text, an easy-to-use active learning library, which offers pool-based active learning for single- and multi-label text classification in Python. It features numerous pre-implemented state-of-the-art query strategies, including some that leverage the GPU. Standardized interfaces allow the combination of a variety of classifiers, query strategies, and stopping criteria, facilitating a quick mix and match, and enabling a rapid development of both active learning experiments and applications. With the objective of making various classifiers and query strategies accessible for active learning, small-text integrates several well-known machine learning libraries, namely scikit-learn, Pytorch, and Hugging Face transformers. The latter integrations are optionally installable extensions, so GPUs can be used but are not required. Using this new library, we investigate the performance of the recently published SetFit training paradigm, which we compare to vanilla transformer fine-tuning, finding that it matches the latter in classification accuracy while outperforming it in area under the curve. The library is available under the MIT License at https://github.com/webis-de/small-text, in version 1.3.0 at the time of writing.
Anthology ID:
2023.eacl-demo.11
Original:
2023.eacl-demo.11v1
Version 2:
2023.eacl-demo.11v2
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Danilo Croce, Luca Soldaini
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
84–95
Language:
URL:
https://aclanthology.org/2023.eacl-demo.11
DOI:
10.18653/v1/2023.eacl-demo.11
Award:
 EACL Best System Demonstration
Bibkey:
Cite (ACL):
Christopher Schröder, Lydia Müller, Andreas Niekler, and Martin Potthast. 2023. Small-Text: Active Learning for Text Classification in Python. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 84–95, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Small-Text: Active Learning for Text Classification in Python (Schröder et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-demo.11.pdf
Video:
 https://aclanthology.org/2023.eacl-demo.11.mp4
Code
 webis-de/small-text