Training Models on Oversampled Data and a Novel Multi-class Annotation Scheme for Dementia Detection

Nadine Abdelhalim; Ingy Abdelhalim; Riza Theresa Batista-Navarro

doi:10.18653/v1/2023.clinicalnlp-1.15

Training Models on Oversampled Data and a Novel Multi-class Annotation Scheme for Dementia Detection

Nadine Abdelhalim, Ingy Abdelhalim, Riza Batista-Navarro

Abstract

This work introduces a novel three-class annotation scheme for text-based dementia classification in patients, based on their recorded visit interactions. Multiple models were developed utilising BERT, RoBERTa and DistilBERT. Two approaches were employed to improve the representation of dementia samples: oversampling the underrepresented data points in the original Pitt dataset and combining the Pitt with the Holland and Kempler datasets. The DistilBERT models trained on either an oversampled Pitt dataset or the combined dataset performed best in classifying the dementia class. Specifically, the model trained on the oversampled Pitt dataset and the one trained on the combined dataset obtained state-of-the-art performance with 98.8% overall accuracy and 98.6% macro-averaged F1-score, respectively. The models’ outputs were manually inspected through saliency highlighting, using Local Interpretable Model-agnostic Explanations (LIME), to provide a better understanding of its predictions.

Anthology ID:: 2023.clinicalnlp-1.15
Volume:: Proceedings of the 5th Clinical Natural Language Processing Workshop
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Anna Rumshisky
Venue:: ClinicalNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 118–124
Language:
URL:: https://aclanthology.org/2023.clinicalnlp-1.15
DOI:: 10.18653/v1/2023.clinicalnlp-1.15
Bibkey:
Cite (ACL):: Nadine Abdelhalim, Ingy Abdelhalim, and Riza Batista-Navarro. 2023. Training Models on Oversampled Data and a Novel Multi-class Annotation Scheme for Dementia Detection. In Proceedings of the 5th Clinical Natural Language Processing Workshop, pages 118–124, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Training Models on Oversampled Data and a Novel Multi-class Annotation Scheme for Dementia Detection (Abdelhalim et al., ClinicalNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.clinicalnlp-1.15.pdf

PDF Cite Search