ALVIN: Active Learning Via INterpolation

Michalis Korakakis, Andreas Vlachos, Adrian Weller


Abstract
Active Learning aims to minimize annotation effort by selecting the most useful instances from a pool of unlabeled data. However, typical active learning methods overlook the presence of distinct example groups within a class, whose prevalence may vary, e.g., in occupation classification datasets certain demographics are disproportionately represented in specific classes. This oversight causes models to rely on shortcuts for predictions, i.e., spurious correlations between input attributes and labels occurring in well-represented groups. To address this issue, we propose Active Learning Via INterpolation (ALVIN), which conducts intra-class interpolations between examples from under-represented and well-represented groups to create anchors, i.e., artificial points situated between the example groups in the representation space. By selecting instances close to the anchors for annotation, ALVIN identifies informative examples exposing the model to regions of the representation space that counteract the influence of shortcuts. Crucially, since the model considers these examples to be of high certainty, they are likely to be ignored by typical active learning methods. Experimental results on six datasets encompassing sentiment analysis, natural language inference, and paraphrase detection demonstrate that ALVIN outperforms state-of-the-art active learning methods in both in-distribution and out-of-distribution generalization.
Anthology ID:
2024.emnlp-main.1265
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22715–22728
Language:
URL:
https://aclanthology.org/2024.emnlp-main.1265
DOI:
Bibkey:
Cite (ACL):
Michalis Korakakis, Andreas Vlachos, and Adrian Weller. 2024. ALVIN: Active Learning Via INterpolation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 22715–22728, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
ALVIN: Active Learning Via INterpolation (Korakakis et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.1265.pdf