Handling Extreme Class Imbalance in Technical Logbook Datasets

Farhad Akhbardeh, Cecilia Ovesdotter Alm, Marcos Zampieri, Travis Desell


Abstract
Technical logbooks are a challenging and under-explored text type in automated event identification. These texts are typically short and written in non-standard yet technical language, posing challenges to off-the-shelf NLP pipelines. The granularity of issue types described in these datasets additionally leads to class imbalance, making it challenging for models to accurately predict which issue each logbook entry describes. In this paper we focus on the problem of technical issue classification by considering logbook datasets from the automotive, aviation, and facilities maintenance domains. We adapt a feedback strategy from computer vision for handling extreme class imbalance, which resamples the training data based on its error in the prediction process. Our experiments show that with statistical significance this feedback strategy provides the best results for four different neural network models trained across a suite of seven different technical logbook datasets from distinct technical domains. The feedback strategy is also generic and could be applied to any learning problem with substantial class imbalances.
Anthology ID:
2021.acl-long.312
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4034–4045
Language:
URL:
https://aclanthology.org/2021.acl-long.312
DOI:
10.18653/v1/2021.acl-long.312
Bibkey:
Cite (ACL):
Farhad Akhbardeh, Cecilia Ovesdotter Alm, Marcos Zampieri, and Travis Desell. 2021. Handling Extreme Class Imbalance in Technical Logbook Datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4034–4045, Online. Association for Computational Linguistics.
Cite (Informal):
Handling Extreme Class Imbalance in Technical Logbook Datasets (Akhbardeh et al., ACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-long.312.pdf
Video:
 https://aclanthology.org/2021.acl-long.312.mp4