%0 Conference Proceedings %T Improving Classification of Infrequent Cognitive Distortions: Domain-Specific Model vs. Data Augmentation %A Ding, Xiruo %A Lybarger, Kevin %A Tauscher, Justin %A Cohen, Trevor %Y Ippolito, Daphne %Y Li, Liunian Harold %Y Pacheco, Maria Leonor %Y Chen, Danqi %Y Xue, Nianwen %S Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop %D 2022 %8 July %I Association for Computational Linguistics %C Hybrid: Seattle, Washington + Online %F ding-etal-2022-improving %X Cognitive distortions are counterproductive patterns of thinking that are one of the targets of cognitive behavioral therapy (CBT). These can be challenging for clinicians to detect, especially those without extensive CBT training or supervision. Text classification methods can approximate expert clinician judgment in the detection of frequently occurring cognitive distortions in text-based therapy messages. However, performance with infrequent distortions is relatively poor. In this study, we address this sparsity problem with two approaches: Data Augmentation and Domain-Specific Model. The first approach includes Easy Data Augmentation, back translation, and mixup techniques. The second approach utilizes a domain-specific pretrained language model, MentalBERT. To examine the viability of different data augmentation methods, we utilized a real-world dataset of texts between therapists and clients diagnosed with serious mental illness that was annotated for distorted thinking. We found that with optimized parameter settings, mixup was helpful for rare classes. Performance improvements with an augmented model, MentalBERT, exceed those obtained with data augmentation. %R 10.18653/v1/2022.naacl-srw.9 %U https://aclanthology.org/2022.naacl-srw.9 %U https://doi.org/10.18653/v1/2022.naacl-srw.9 %P 68-75