Investigating Multi-source Active Learning for Natural Language Inference

Ard Snijders, Douwe Kiela, Katerina Margatina


Abstract
In recent years, active learning has been successfully applied to an array of NLP tasks. However, prior work often assumes that training and test data are drawn from the same distribution. This is problematic, as in real-life settings data may stem from several sources of varying relevance and quality. We show that four popular active learning schemes fail to outperform random selection when applied to unlabelled pools comprised of multiple data sources on the task of natural language inference. We reveal that uncertainty-based strategies perform poorly due to the acquisition of collective outliers, i.e., hard-to-learn instances that hamper learning and generalisation. When outliers are removed, strategies are found to recover and outperform random baselines. In further analysis, we find that collective outliers vary in form between sources, and show that hard-to-learn data is not always categorically harmful. Lastly, we leverage dataset cartography to introduce difficulty-stratified testing and find that different strategies are affected differently by example learnability and difficulty.
Anthology ID:
2023.eacl-main.160
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2187–2209
Language:
URL:
https://aclanthology.org/2023.eacl-main.160
DOI:
10.18653/v1/2023.eacl-main.160
Bibkey:
Cite (ACL):
Ard Snijders, Douwe Kiela, and Katerina Margatina. 2023. Investigating Multi-source Active Learning for Natural Language Inference. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2187–2209, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Investigating Multi-source Active Learning for Natural Language Inference (Snijders et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.160.pdf
Software:
 2023.eacl-main.160.software.zip
Video:
 https://aclanthology.org/2023.eacl-main.160.mp4