Improving Classification of Infrequent Cognitive Distortions: Domain-Specific Model vs. Data Augmentation

Xiruo Ding, Kevin Lybarger, Justin Tauscher, Trevor Cohen


Abstract
Cognitive distortions are counterproductive patterns of thinking that are one of the targets of cognitive behavioral therapy (CBT). These can be challenging for clinicians to detect, especially those without extensive CBT training or supervision. Text classification methods can approximate expert clinician judgment in the detection of frequently occurring cognitive distortions in text-based therapy messages. However, performance with infrequent distortions is relatively poor. In this study, we address this sparsity problem with two approaches: Data Augmentation and Domain-Specific Model. The first approach includes Easy Data Augmentation, back translation, and mixup techniques. The second approach utilizes a domain-specific pretrained language model, MentalBERT. To examine the viability of different data augmentation methods, we utilized a real-world dataset of texts between therapists and clients diagnosed with serious mental illness that was annotated for distorted thinking. We found that with optimized parameter settings, mixup was helpful for rare classes. Performance improvements with an augmented model, MentalBERT, exceed those obtained with data augmentation.
Anthology ID:
2022.naacl-srw.9
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Month:
July
Year:
2022
Address:
Hybrid: Seattle, Washington + Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
68–75
Language:
URL:
https://aclanthology.org/2022.naacl-srw.9
DOI:
10.18653/v1/2022.naacl-srw.9
Bibkey:
Cite (ACL):
Xiruo Ding, Kevin Lybarger, Justin Tauscher, and Trevor Cohen. 2022. Improving Classification of Infrequent Cognitive Distortions: Domain-Specific Model vs. Data Augmentation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 68–75, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
Cite (Informal):
Improving Classification of Infrequent Cognitive Distortions: Domain-Specific Model vs. Data Augmentation (Ding et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-srw.9.pdf
Video:
 https://aclanthology.org/2022.naacl-srw.9.mp4