Zafar Sarif
2025
Identifying Severity of Depression in Forum Posts using Zero-Shot Classifier and DistilBERT Model
Zafar Sarif
|
Sannidhya Das
|
Dr. Abhishek Das
|
Md Fahin Parvej
|
Dipankar Das
Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities
This paper presents our approach to the RANLP 2025 Shared Task on “Identification of the Severity of Depression in Forum Posts.” The objective of the task is to classify user-generated posts into one of four severity levels of depression: subthreshold, mild, moderate, or severe. A key challenge in the task was the absence of annotated training data. To address this, we employed a two-stage pipeline: first, we used zero-shot classification with facebook/bart-large-mnli to generate pseudo-labels for the unlabeled training set. Next, we fine-tuned a DistilBERT model on the pseudo-labeled data for multi-class classification. Our system achieved an internal accuracy of 0.92 on the pseudo-labeled test set and an accuracy of 0.289 on the official blind evaluation set. These results demonstrate the feasibility of leveraging zero-shot learning and weak supervision for mental health classification tasks, even in the absence of gold-standard annotations.
Trans-Sent at SemEval-2025 Task 11: Text-based Multi-label Emotion Detection using Pre-Trained BERT Transformer Models
Zafar Sarif
|
Md Sharib Akhtar
|
Abhishek Das
|
Dipankar Das
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
We have introduced Trans-Sent, a Transformer-based model designed for multi-label emotion classification in SemEval-2025 Task 11. The model predicts perceived emotions such as joy, sadness, anger, fear, surprise, and disgust from text across seven languages, including Amharic, German, English, Hindi, Marathi, Russian, and Romanian. To handle data imbalance, the system incorporates preprocessing techniques, SMOTE oversampling, and feature engineering to enhance classification accuracy. The model was trained using the BRIGHTER and EthioEmo datasets, which contain diverse textual sources, such as social media, news, literature, and personal narratives. Traditional machine learning models, including Logistic Regression and Decision Trees, were tested but proved inadequate for multi-label classification due to their limited ability to capture contextual and semantic meaning. Fine-tuned BERT models demonstrated superior performance, with Russian achieving the highest ranking (9th overall), while languages with complex grammar, such as German and Amharic, performed lower. Future enhancements may include advanced data augmentation, cross-lingual learning, and multimodal emotion analysis to improve classification across different languages. Trans-Sent contributes to NLP by advancing multi-label emotion detection, particularly in underrepresented languages.
Search
Fix author
Co-authors
- Dipankar Das 2
- Md Sharib Akhtar 1
- Sannidhya Das 1
- Dr. Abhishek Das 1
- Abhishek Das 1
- show all...