Label Errors in BANKING77

Cecilia Ying, Stephen Thomas


Abstract
We investigate potential label errors present in the popular BANKING77 dataset and the associated negative impacts on intent classification methods. Motivated by our own negative results when constructing an intent classifier, we applied two automated approaches to identify potential label errors in the dataset. We found that over 1,400 (14%) of the 10,003 training utterances may have been incorrectly labelled. In a simple experiment, we found that by removing the utterances with potential errors, our intent classifier saw an increase of 4.5% and 8% for the F1-Score and Adjusted Rand Index, respectively, in supervised and unsupervised classification. This paper serves as a warning of the potential of noisy labels in popular NLP datasets. Further study is needed to fully identify the breadth and depth of label errors in BANKING77 and other datasets.
Anthology ID:
2022.insights-1.19
Volume:
Proceedings of the Third Workshop on Insights from Negative Results in NLP
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Shabnam Tafreshi, João Sedoc, Anna Rogers, Aleksandr Drozd, Anna Rumshisky, Arjun Akula
Venue:
insights
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
139–143
Language:
URL:
https://aclanthology.org/2022.insights-1.19
DOI:
10.18653/v1/2022.insights-1.19
Bibkey:
Cite (ACL):
Cecilia Ying and Stephen Thomas. 2022. Label Errors in BANKING77. In Proceedings of the Third Workshop on Insights from Negative Results in NLP, pages 139–143, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Label Errors in BANKING77 (Ying & Thomas, insights 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.insights-1.19.pdf
Video:
 https://aclanthology.org/2022.insights-1.19.mp4
Data
BANKING77