Sadegh Jafari

2025

pdf bib abs
DadmaTools V2: an Adapter-Based Natural Language Processing Toolkit for the Persian Language
Sadegh Jafari | Farhan Farsi | Navid Ebrahimi | Mohamad Bagher Sajadi | Sauleh Eetemadi
Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script

DadmaTools V2 is a comprehensive repository designed to enhance NLP capabilities for the Persian language, catering to industry practitioners seeking practical and efficient solutions. The toolkit provides extensive code examples demonstrating the integration of its models with popular NLP frameworks such as Trankit and Transformers, as well as deep learning frameworks like PyTorch. Additionally, DadmaTools supports widely used Persian embeddings and datasets, ensuring robust language processing capabilities. The latest version of DadmaTools introduces an adapter-based technique, significantly reducing memory usage by employing a shared pre-trained model across various tasks, supplemented with task-specific adapter layers. This approach eliminates the need to maintain multiple pre-trained models and optimize resource utilization. Enhancements in this version include adding new modules such as a sentiment detector, an informal-to-formal text converter, and a spell checker, further expanding the toolkit’s functionality. DadmaTools V2 thus represents a powerful, efficient, and versatile resource for advancing Persian NLP applications.

Mental health disorders such as stress, anxiety, and depression are increasingly prevalent globally, yet access to care remains limited due to barriers like geographic isolation, financial constraints, and stigma. Conversational agents or chatbots have emerged as viable digital tools for personalized mental health support. This paper presents the development of a psychological health chatbot designed specifically for Persian-speaking individuals, offering a culturally sensitive tool for emotion detection and disorder identification. The chatbot integrates several advanced natural language processing (NLP) modules, leveraging the ArmanEmo dataset to identify emotions, assess psychological states, and ensure safe, appropriate responses. Our evaluation of various models, including ParsBERT and XLM-RoBERTa, demonstrates effective emotion detection with accuracy up to 75.39%. Additionally, the system incorporates a Large Language Model (LLM) to generate messages. This chatbot serves as a promising solution for addressing the accessibility gap in mental health care and provides a scalable, language-inclusive platform for psychological support.

pdf bib abs
data2lang2vec: Data Driven Typological Features Completion
Hamidreza Amirzadeh | Sadegh Jafari | Anika Harju | Rob van der Goot
Proceedings of the 31st International Conference on Computational Linguistics

Language typology databases enhance multi-lingual Natural Language Processing (NLP) by improving model adaptability to diverse linguistic structures. The widely-used lang2vec toolkit integrates several such databases, but its coverage remains limited at 28.9%. Previous work on automatically increasing coverage predicts missing values based on features from other languages or focuses on single features, we propose to use textual data for better-informed feature prediction. To this end, we introduce a multi-lingual Part-of-Speech (POS) tagger, achieving over 70% accuracy across 1,749 languages, and experiment with external statistical features and a variety of machine learning algorithms. We also introduce a more realistic evaluation setup, focusing on likely to be missing typology features, and show that our approach outperforms previous work in both setups.

2024

pdf bib abs
DRAGON at FIGNEWS 2024 Shared Task: a Dedicated RAG for October 7th conflict News
Sadegh Jafari | Mohsen Mahmoodzadeh | Vanooshe Nazari | Razieh Bahmanyar | Kathryn Burrows
Proceedings of the Second Arabic Natural Language Processing Conference

In this study, we present a novel approach to annotating bias and propaganda in social media data by leveraging topic modeling techniques. Utilizing the BERTopic tool, we performed topic modeling on the FIGNEWS Shared-task dataset, which initially comprised 13,500 samples. From this dataset, we identified 35 distinct topics and selected approximately 50 representative samples from each topic, resulting in a subset of 1,812 samples. These selected samples were meticulously annotated for bias and propaganda labels. Subsequently, we employed multiple methods like KNN, SVC, XGBoost, and RAG to develop a classifier capable of detecting bias and propaganda within social media content. Our approach demonstrates the efficacy of using topic modeling for efficient data subset selection and provides a robust foundation for improving the accuracy of bias and propaganda detection in large-scale social media datasets.

2023

pdf bib abs
A longitudinal study about gradual changes in the Iranian Online Public Sphere pre and post of ‘Mahsa Moment’: Focusing on Twitter
Sadegh Jafari | Amin Fathi | Abolfazl Hajizadegan | Amirmohammad Kazemeini | Sauleh Eetemadi
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

Mahsa Amini’s death shocked Iranian society. The effects of this event and the subsequent tragedies in Iran not only in realspace but also in cyberspace, including Twitter, were tremendous and unimaginable. We explore how Twitter has changed after Mahsa Amini’s death by analyzing the sentiments of Iranian users in the 90 days after this event. Additionally, we track the change in word meaning and each word’s neighboring words. Finally, we use word clustering methods for topic modeling.