Deuce: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning

Jiaxin Guo, C. L. Philip Chen, Shuzhen Li, Tong Zhang


Abstract
Cold-start active learning (CSAL) selects valuable instances from an unlabeled dataset for manual annotation. It provides high-quality data at a low annotation cost for label-scarce text classification. However, existing CSAL methods overlook weak classes and hard representative examples, resulting in biased learning. To address these issues, this paper proposes a novel dual-diversity enhancing and uncertainty-aware (Deuce) framework for CSAL. Specifically, Deuce leverages a pretrained language model (PLM) to efficiently extract textual representations, class predictions, and predictive uncertainty. Then, it constructs a Dual-Neighbor Graph (DNG) to combine information on both textual diversity and class diversity, ensuring a balanced data distribution. It further propagates uncertainty information via density-based clustering to select hard representative instances. Deuce performs well in selecting class-balanced and hard representative data by dual-diversity and informativeness. Experiments on six NLP datasets demonstrate the superiority and efficiency of Deuce.
Anthology ID:
2024.tacl-1.94
Volume:
Transactions of the Association for Computational Linguistics, Volume 12
Month:
Year:
2024
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
1736–1754
Language:
URL:
https://aclanthology.org/2024.tacl-1.94/
DOI:
10.1162/tacl_a_00731
Bibkey:
Cite (ACL):
Jiaxin Guo, C. L. Philip Chen, Shuzhen Li, and Tong Zhang. 2024. Deuce: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning. Transactions of the Association for Computational Linguistics, 12:1736–1754.
Cite (Informal):
Deuce: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning (Guo et al., TACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.tacl-1.94.pdf