Smooth Sailing: Improving Active Learning for Pre-trained Language Models with Representation Smoothness Analysis

Josip Jukić; Jan Šnajder

Smooth Sailing: Improving Active Learning for Pre-trained Language Models with Representation Smoothness Analysis

Abstract

Developed to alleviate prohibitive labeling costs, active learning (AL) methods aim to reduce label complexity in supervised learning. While recent work has demonstrated the benefit of using AL in combination with large pre-trained language models (PLMs), it has often overlooked the practical challenges that hinder the effectiveness of AL. We address these challenges by leveraging representation smoothness analysis to ensure AL is feasible, that is, both effective and practicable. Firstly, we propose an early stopping technique that does not require a validation set – often unavailable in realistic AL conditions – and observe significant improvements over random sampling across multiple datasets and AL methods. Further, we find that task adaptation improves AL, whereas standard short fine-tuning in AL does not provide improvements over random sampling. Our work demonstrates the usefulness of representation smoothness analysis for AL and introduces an AL stopping criterion that reduces label complexity.

Anthology ID:: 2023.clasp-1.2
Original:: 2023.clasp-1.2v1
Version 2:: 2023.clasp-1.2v2
Volume:: Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
Month:: September
Year:: 2023
Address:: Gothenburg, Sweden
Editors:: Ellen Breitholtz, Shalom Lappin, Sharid Loaiciga, Nikolai Ilinykh, Simon Dobnik
Venue:: CLASP
SIG:: SIGSEM
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11–24
Language:
URL:: https://aclanthology.org/2023.clasp-1.2/
DOI:
Bibkey:
Cite (ACL):: Josip Jukić and Jan Snajder. 2023. Smooth Sailing: Improving Active Learning for Pre-trained Language Models with Representation Smoothness Analysis. In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 11–24, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):: Smooth Sailing: Improving Active Learning for Pre-trained Language Models with Representation Smoothness Analysis (Jukić & Snajder, CLASP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.clasp-1.2.pdf

PDF (v2) PDF (v1) Cite Search Fix data