Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Bharathi R. Chakravarthi, B. Bharathi, Joephine Griffith, Kalika Bali, Paul Buitelaar (Editors)

Anthology ID:: 2023.ltedi-1
Month:: September
Year:: 2023
Address:: Varna, Bulgaria
Venues:: LTEDI | WS
Events:: Workshop on Language Technology for Equality, Diversity, and Inclusion (2023) | International Conference Recent Advances in Natural Language Processing (2023) | Other Workshops and Events (2023)
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
URL:: https://aclanthology.org/2023.ltedi-1/
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://aclanthology.org/2023.ltedi-1.pdf

Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
Bharathi R. Chakravarthi | B. Bharathi | Joephine Griffith | Kalika Bali | Paul Buitelaar

pdf bib abs

An Exploration of Zero-Shot Natural Language Inference-Based Hate Speech Detection
Nerses Yuzbashyan | Nikolay Banar | Ilia Markov | Walter Daelemans

Conventional techniques for detecting online hate speech rely on the availability of a sufficient number of annotated instances, which can be costly and time consuming. For this reason, zero-shot or few-shot detection can offer an attractive alternative. In this paper, we explore a zero-shot detection approach based on natural language inference (NLI) models. Since the performance of the models in this approach depends heavily on the choice of a hypothesis, our goal is to determine which factors affect the quality of detection. We conducted a set of experiments with three NLI models and four hate speech datasets. We demonstrate that a zero-shot NLI-based approach is competitive with approaches that require supervised learning, yet they are highly sensitive to the choice of hypothesis. In addition, our experiments indicate that the results for a set of hypotheses on different model-data pairs are positively correlated, and that the correlation is higher for different datasets when using the same model than it is for different models when using the same dataset. These results suggest that if we find a hypothesis that works well for a specific model and domain or for a specific type of hate speech, we can use that hypothesis with the same model also within a different domain. While, another model might require different suitable hypotheses in order to demonstrate high performance.

pdf bib abs

English2BSL: A Rule-Based System for Translating English into British Sign Language
Phoebe Alexandra Pinney | Riza Batista-Navarro

British Sign Language (BSL) is a complex language with its own vocabulary and grammatical structure, separate from English. Despite its long-standing and widespread use by Deaf communities within the UK, thus far, there have been no effective tools for translating written English into BSL. This overt lack of available resources made learning the language highly inaccessible for most people, exacerbating the communication barrier between hearing and Deaf individuals. This paper introduces a rule-based translation system, designed with the ambitious aim of creating the first web application that is not only able to translate sentences in written English into a BSL video output, but can also serve as a learning aid to empower the development of BSL proficiency.

pdf bib abs

Multilingual Models for Sentiment and Abusive Language Detection for Dravidian Languages
Anand Kumar M

This paper presents the TFIDF based LSTM and Hierarchical Attention Networks (HAN) for code-mixed abusive comment detection and sentiment analysis for Dravidian languages. The traditional TF-IDF-based techniques have out- performed the Hierarchical Attention models in both the sentiment analysis and abusive language detection tasks. The Tulu sentiment analysis system demonstrated better performance for the Positive and Neutral classes, whereas the Tamil sentiment analysis system exhibited lower performance overall. This highlights the need for more balanced datasets and additional research to enhance the accuracy of sentiment analysis in the Tamil language. In terms of abusive language detection, the TF-IDF-LSTM models generally outperformed the Hierarchical Attention models. However, the mixed models displayed better performance for specific classes such as “Homophobia” and “Xenophobia.” This implies that considering both code-mixed and original script data can offer a different perspective for research in social media analysis.

pdf bib abs

Social media has become a vital platform for personal communication. Its widespread use as a primary means of public communication offers an exciting opportunity for early detection and management of mental health issues. People often share their emotions on social media, but understanding the true depth of their feelings can be challenging. Depression, a prevalent problem among young people, is of particular concern due to its link with rising suicide rates. Identifying depression levels in social media texts is crucial for timely support and prevention of negative outcomes. However, it’s a complex task because human emotions are dynamic and can change significantly over time. The DepSign-LT-EDI@RANLP 2023 shared task aims to classify social media text into three depression levels: “Not Depressed,” “Moderately Depressed,” and “Severely Depressed.” This overview covers task details, dataset, methodologies used, and results analysis. Roberta-based models emerged as top performers, with the best result achieving an impressive macro F1-score of 0.584 among 31 participating teams.

pdf bib abs

This paper manifest the overview of the shared task on Speech Recognition for Vulnerable individuals in Tamil(LT-EDI-ACL2023). Task is provided with an Tamil dataset, which is collected from elderly people of three different genders, male, female and transgender. The audio samples were recorded from the public locations like hospitals, markets, vegetable shop, etc. The dataset is released in two phase, training and testing phase. The partcipants were asked to use different models and methods to handle audio signals and submit the result as transcription of the test samples given. The result submitted by the participants was evaluated using WER (Word Error Rate). The participants used the transformer-based model for automatic speech recognition. The results and different pre-trained transformer based models used by the participants is discussed in this overview paper.

pdf bib abs

We present an overview of the second shared task on homophobia/transphobia Detection in social media comments. Given a comment, a system must predict whether or not it contains any form of homophobia/transphobia. The shared task included five languages: English, Spanish, Tamil, Hindi, and Malayalam. The data was given for two tasks. Task A was given three labels, and Task B fine-grained seven labels. In total, 75 teams enrolled for the shared task in Codalab. For task A, 12 teams submitted systems for English, eight teams for Tamil, eight teams for Spanish, and seven teams for Hindi. For task B, nine teams submitted for English, 7 teams for Tamil, 6 teams for Malayalam. We present and analyze all submissions in this paper.

pdf bib abs

Hope serves as a powerful driving force that encourages individuals to persevere in the face of the unpredictable nature of human existence. It instills motivation within us to remain steadfast in our pursuit of important goals, regardless of the uncertainties that lie ahead. In today’s digital age, platforms such as Facebook, Twitter, Instagram, and YouTube have emerged as prominent social media outlets where people freely express their views and opinions. These platforms have also become crucial for marginalized individuals seeking online assistance and support[1][2][3]. The outbreak of the pandemic has exacerbated people’s fears around the world, as they grapple with the possibility of losing loved ones and the lack of access to essential services such as schools, hospitals, and mental health facilities.

pdf bib abs

Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish
Henrik Björklund | Hannah Devinney

Part of Speech (POS) taggers for Swedish routinely fail for the third person gender-neutral pronoun “hen”, despite the fact that it has been a well-established part of the Swedish language since at least 2014. In addition to simply being a form of gender bias, this failure can have negative effects on other tasks relying on POS information. We demonstrate the usefulness of semi-synthetic augmented datasets in a case study, retraining a POS tagger to correctly recognize “hen” as a personal pronoun. We evaluate our retrained models for both tag accuracy and on a downstream task (dependency parsing) in a classicial NLP pipeline. Our results show that adding such data works to correct for the disparity in performance. The accuracy rate for identifying “hen” as a pronoun can be brought up to acceptable levels with only minor adjustments to the tagger’s vocabulary files. Performance parity to gendered pronouns can be reached after retraining with only a few hundred examples. This increase in POS tag accuracy also results in improvements for dependency parsing sentences containing hen.

pdf bib abs

Evaluating the Impact of Stereotypes and Language Combinations on Gender Bias Occurrence in NMT Generic Systems
Bertille Triboulet | Pierrette Bouillon

Machine translation, and more specifically neural machine translation (NMT), have been proven to be subject to gender bias in recent years. Many studies have focused on evaluating and reducing this phenomenon, mainly through the analysis of occupational nouns’ translation for the same type of language combinations. In this paper, we reproduce a similar test set than in previous studies to investigate the influence of stereotypes and language combinations’ nature (formed with English, French and Italian) on gender bias occurrence in NMT. Similarly to previous studies, we confirm stereotypes as a major source of gender bias, especially in female contexts, while observing bias even in language combinations traditionally less examined.

pdf bib abs

KaustubhSharedTask@LT-EDI 2023: Homophobia-Transphobia Detection in Social Media Comments with NLPAUG-driven Data Augmentation
Kaustubh Lande | Rahul Ponnusamy | Prasanna Kumar Kumaresan | Bharathi Raja Chakravarthi

Our research in Natural Language Processing (NLP) aims to detect hate speech comments specifically targeted at the LGBTQ+ community within the YouTube platform shared task conducted by LTEDI workshop. The dataset provided by the organizers exhibited a high degree of class imbalance, and to mitigate this, we employed NLPAUG, a data augmentation library. We employed several classification methods and reported the results using recall, precision, and F1-score metrics. The classification models discussed in this paper include a Bidirectional Long Short-Term Memory (BiLSTM) model trained with Word2Vec embeddings, a BiLSTM model trained with Twitter GloVe embeddings, transformer models such as BERT, DistiBERT, RoBERTa, and XLM-RoBERTa, all of which were trained and fine-tuned. We achieved a weighted F1-score of 0.699 on the test data and secured fifth place in task B with 7 classes for the English language.

pdf bib abs

JudithJeyafreeda@LT-EDI-2023: Using GPT model for recognition of Homophobia/Transphobia detection from social media
Judith Jeyafreeda Andrew

Homophobia and Transphobia is defined as hatred or discomfort towards Gay, Lesbian, Transgender or Bisexual people. With the increase in social media, communication has become free and easy. This also means that people can also express hatred and discomfort towards others. Studies have shown that these can cause mental health issues. Thus detection and masking/removal of these comments from the social media platforms can help with understanding and improving the mental health of LGBTQ+ people. In this paper, GPT2 is used to detect homophobic and/or transphobic comments in social media comments. The comments used in this paper are from five (English, Spanish, Tamil, Malayalam and Hindi) languages. The results show that detecting comments in English language is easier when compared to the other languages.

pdf bib abs

iicteam@LT-EDI-2023: Leveraging pre-trained Transformers for Fine-Grained Depression Level Detection in Social Media
Vajratiya Vajrobol | Nitisha Aggarwal | Karanpreet Singh

Depression is a prevalent mental illness characterized by feelings of sadness and a lack of interest in daily activities. Early detection of depression is crucial to prevent severe consequences, making it essential to observe and treat the condition at its onset. At ACL-2022, the DepSign-LT-EDI project aimed to identify signs of depression in individuals based on their social media posts, where people often share their emotions and feelings. Using social media postings in English, the system categorized depression signs into three labels: “not depressed,” “moderately depressed,” and “severely depressed.” To achieve this, our team has applied MentalRoBERTa, a model trained on big data of mental health. The test results indicated a macro F1-score of 0.439, ranking the fourth in the shared task.

pdf bib abs

JA-NLP@LT-EDI-2023: Empowering Mental Health Assessment: A RoBERTa-Based Approach for Depression Detection
Jyoti Kumari | Abhinav Kumar

Depression, a widespread mental health disorder, affects a significant portion of the global population. Timely identification and intervention play a crucial role in ensuring effective treatment and support. Therefore, this research paper proposes a fine-tuned RoBERTa-based model for identifying depression in social media posts. In addition to the proposed model, Sentence-BERT is employed to encode social media posts into vector representations. These encoded vectors are then utilized in eight different popular classical machine learning models. The proposed fine-tuned RoBERTa model achieved a best macro F1-score of 0.55 for the development dataset and a comparable score of 0.41 for the testing dataset. Additionally, combining Sentence-BERT with Naive Bayes (S-BERT + NB) outperformed the fine-tuned RoBERTa model, achieving a slightly higher macro F1-score of 0.42. This demonstrates the effectiveness of the approach in detecting depression from social media posts.

pdf bib abs

Team-KEC@LT-EDI: Detecting Signs of Depression from Social Media Text
Malliga S | Kogilavani Shanmugavadivel | Arunaa S | Gokulkrishna R | Chandramukhii A

The rise of social media has led to a drastic surge in the dissemination of hostile and toxic content, fostering an alarming proliferation of hate speech, inflammatory remarks, and abusive language. The exponential growth of social media has facilitated the widespread circulation of hostile and toxic content, giving rise to an unprecedented influx of hate speech, incendiary language, and abusive rhetoric. The study utilized different techniques to represent the text data in a numerical format. Word embedding techniques aim to capture the semantic and syntactic information of the text data, which is essential in text classification tasks. The study utilized various techniques such as CNN, BERT, and N-gram to classify social media posts into depression and non-depression categories. Text classification tasks often rely on deep learning techniques such as Convolutional Neural Networks (CNN), while the BERT model, which is pre-trained, has shown exceptional performance in a range of natural language processing tasks. To assess the effectiveness of the suggested approaches, the research employed multiple metrics, including accuracy, precision, recall, and F1-score. The outcomes of the investigation indicate that the suggested techniques can identify symptoms of depression with an average accuracy rate of 56%.

pdf bib abs

cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models
Sidney Wong | Matthew Durward | Benjamin Adams | Jonathan Dunn

This paper describes our multiclass classification system developed as part of the LT-EDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based cross-language pretrained language model, XLM-RoBERTa, with spatially and temporally relevant social media language data. We found the inclusion of this spatio-temporal data improved the classification performance for all language and task conditions when compared with the baseline. We also retrained a subset of models with simulated script-mixed social media language data with varied performance. The results from the current study suggests that transformer-based language classification systems are sensitive to register-specific and language-specific retraining.

pdf bib abs

NLP_CHRISTINE@LT-EDI-2023: RoBERTa & DeBERTa Fine-tuning for Detecting Signs of Depression from Social Media Text
Christina Christodoulou

The paper describes the system for the 4th Shared task on “Detecting Signs of Depression from Social Media Text” at LT-EDI@RANLP 2023, which aimed to identify signs of depression on English social media texts. The solution comprised data cleaning and pre-processing, the use of additional data, a method to deal with data imbalance as well as fine-tuning of two transformer-based pre-trained language models, RoBERTa-Large and DeBERTa-V3-Large. Four model architectures were developed by leveraging different word embedding pooling methods, namely a RoBERTa-Large bidirectional GRU model using GRU pooling and three DeBERTa models using CLS pooling, mean pooling and max pooling, respectively. Although ensemble learning of DeBERTa’s pooling methods through majority voting was employed for better performance, the RoBERTa bidirectional GRU model managed to receive the 8th place out of 31 submissions with 0.42 Macro-F1 score.

pdf bib abs

IIITDWD@LT-EDI-2023 Unveiling Depression: Using pre-trained language models for Harnessing Domain-Specific Features and Context Information
Shankar Biradar | Sunil Saumya | Sanjana Kavatagi

Depression has become a common health problem impacting millions of individuals globally. Workplace stress and an unhealthy lifestyle have increased in recent years, leading to an increase in the number of people experiencing depressive symptoms. The spread of the epidemic has further exacerbated the problem. Early detection and precise prediction of depression are critical for early intervention and support for individuals at risk. However, due to the social stigma associated with the illness, many people are afraid to consult healthcare specialists, making early detection practically impossible. As a result, alternative strategies for depression prediction are being investigated, one of which is analyzing users’ social media posting behaviour. The organizers of LT-EDI@RANLP carried out a shared Task to encourage research in this area. Our team participated in the shared task and secured 21st rank with a macro F1 score 0f 0.36. This article provides a summary of the model presented in the shared task.

pdf bib abs

CIMAT-NLP@LT-EDI-2023: Finegrain Depression Detection by Multiple Binary Problems Approach
María de Jesús García Santiago | Fernando Sánchez Vega | Adrián Pastor López Monroy

This work described the work of the team CIMAT-NLP on the Shared task of Detecting Signs of Depression from Social Media Text at LT-EDI@RANLP 2023, which consists of depression classification on three levels: “not depression”, “moderate” depression and “severe” depression on text from social media. In this work, we proposed two approaches: (1) a transformer model which can handle big text without truncation of its length, and (2) an ensemble of six binary Bag of Words. Our team placed fourth in the competition and found that models trained with our approaches could place second

pdf bib abs

SIS@LT-EDI-2023: Detecting Signs of Depression from Social Media Text
Sulaksha B K | Shruti Krishnaveni S | Ivana Steeve | Monica Jenefer B

Various biological, genetic, psychological or social factors that feature a target oriented life with chronic stress and frequent traumatic experiences, lead to pessimism and apathy. The massive scale of depression should be dealt with as a disease rather than a ‘phase’ that is neglected by the majority. However, not a lot of people are aware of depression and its impact. Depression is a serious issue that should be treated in the right way. Many people dealing with depression do not realize that they have it due to the lack of awareness. This paper aims to address this issue with a tool built on the blocks of machine learning. This model analyzes the public social media texts and detects the signs of depression under three labels namely “not depressed”, “moderately depressed”, and “severely depressed” with high accuracy. The ensembled model uses three learners namely Multi-Layered Perceptron, Support Vector Machine and Multinomial Naive Bayes Classifier. The distinctive feature in this model is that it uses Artificial Neural Networks, Classifiers, Regression and Voting Classifiers to compute the final result or output.

pdf bib abs

TEAM BIAS BUSTERS@LT-EDI-2023: Detecting Signs of Depression with Generative Pretrained Transformers
Andrew Nedilko

This paper describes our methodology adopted to participate in the multi-class classification task under the auspices of the Third Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI) in the Recent Advances in Natural Language Processing (RANLP) 2023 conference. The overall objective was to employ ML algorithms to detect signs of depression in English social media content, classifying each post into one of three categories: no depression, moderate depression, and severe depression. To accomplish this we utilized generative pretrained transformers (GPTs), leveraging the full-scale OpenAI API. Our strategy incorporated prompt engineering for zero-shot and few-shot learning scenarios with ChatGPT and fine-tuning a GPT-3 model. The latter approach yielded the best results which allowed us to outperform our benchmark XGBoost classifier based on character-level features on the dev set and score a macro F1 score of 0.419 on the final blind test set.

pdf bib abs

RANGANAYAKI@LT-EDI: Hope Speech Detection using Capsule Networks
Ranganayaki Em | Abirami Murugappan | Lysa Packiam R S | Deivamani M

HOPE speeches convey uplifting and motivating messages that help enhance mental health and general well-being. Hope speech detection has gained popularity in the field of natural language processing as it gives people the motivation they need to face challenges in life. The momentum behind this technology has been fueled by the demand for encouraging reinforcement online. In this paper, a deep learning approach is proposed in which four different word embedding techniques are used in combination with capsule networks, and a comparative analysis is performed to obtain results. Oversampling is used to address class imbalance problem. The dataset used in this paper is a part of the LT-EDI RANLP 2023 Hope Speech Detection shared task. The approach proposed in this paper achieved a Macro Average F1 score of 0.49 and 0.62 in English and Hindi-English code mix test data, which secured 2nd and 3rd rank respectively in the above mentioned share task.

pdf bib abs

TechSSN1 at LT-EDI-2023: Depression Detection and Classification using BERT Model for Social Media Texts
Venkatasai Ojus Yenumulapalli | Vijai Aravindh R | Rajalakshmi Sivanaiah | Angel Deborah S

Depression is a severe mental health disorder characterized by persistent feelings of sadness and anxiety, a decline in cognitive functioning resulting in drastic changes in a human’s psychological and physical well-being. However, depression is curable completely when treated at a suitable time and treatment resulting in the rejuvenation of an individual. The objective of this paper is to devise a technique for detecting signs of depression from English social media comments as well as classifying them based on their intensity into severe, moderate, and not depressed categories. The paper illustrates three approaches that are developed when working toward the problem. Of these approaches, the BERT model proved to be the most suitable model with an F1 macro score of 0.407, which gave us the 11th rank overall.

pdf bib abs

SANBAR@LT-EDI-2023:Automatic Speech Recognition: vulnerable old-aged and transgender people in Tamil
Saranya S | Bharathi B

An Automatic Speech Recognition systems for Tamil are designed to convert spoken lan- guage or speech signals into written Tamil text. Seniors go to banks, clinics and authoritative workplaces to address their regular necessities. A lot of older people are not aware of the use of the facilities available in public places or office. They need a person to help them. Like- wise, transgender people are deprived of pri- mary education because of social stigma, so speaking is the only way to help them meet their needs. In order to build speech enabled systems, spontaneous speech data is collected from seniors and transgender people who are deprived of using these facilities for their own benefit. The proposed system is developed with pretraind models are IIT Madras transformer ASR model and akashsivanandan/wav2vec2- large-xls-r-300m-tamil model. Both pretrained models are used to evaluate the test speech ut- terances, and obtainted the WER as 37.7144% and 40.55% respectively.

pdf bib abs

ASR_SSN_CSE@LTEDI- 2023: Pretrained Transformer based Automatic Speech Recognition system for Elderly People
Suhasini S | Bharathi B

Submission of the paper for the result submitted in Shared Task on Speech Recognition for Vulnerable Individuals in Tamil- LT-EDI-2023. The task is to develop an automatic speech recognition system for Tamil language. The dataset provided in the task is collected from the elderly people who converse in Tamil language. The proposed ASR system is designed with pre-trained model. The pre-trained model used in our system is fine-tuned with Tamil common voice dataset. The test data released from the task is given to the proposed system, now the transcriptions are generated for the test samples and the generated transcriptions is submitted to the task. The result submitted is evaluated by task, the evaluation metric used is Word Error Rate (WER). Our Proposed system attained a WER of 39.8091%.

pdf bib abs

SSNTech2@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments Using Linear Classification Techniques
Vaidhegi D | Priya M | Rajalakshmi Sivanaiah | Angel Deborah S | Mirnalinee ThankaNadar

The abusive content on social media networks is causing destructive effects on the mental well-being of online users. Homophobia refers to the fear, negative attitudes and feeling towards homosexuality. Transphobia refer to negative attitudes, hatred and prejudice towards transsexual people. Even though, some parts of the society have started to accept homosexuality and transsexuality, there are still a large set of the population opposing it. Hate speech targeting LGBTQ+ individuals, known as homophobia/transphobia speech, has become a growing concern. This has led to a toxic and unwelcoming environment for LGBTQ+ people on online platforms. This poses a significant societal issue, hindering the progress of equality, diversity, and inclusion. The identification of homophobic and transphobic comments on social media platforms plays a crucial role in creating a safer environment for all social media users. In order to accomplish this, we built a machine learning model using SGD and SVM classifier. Our approach yielded promising results, with a weighted F1-score of 0.95 on the English dataset and we secured 4th rank in this task.

pdf bib abs

IJS@LT-EDI : Ensemble Approaches to Detect Signs of Depression from Social Media Text
Jaya Caporusso | Thi Hong Hanh Tran | Senja Pollak

This paper presents our ensembling solutions for detecting signs of depression in social media text, as part of the Shared Task at LT-EDI@RANLP 2023. By leveraging social media posts in English, the task involves the development of a system to accurately classify them as presenting signs of depression of one of three levels: “severe”, “moderate”, and “not depressed”. We verify the hypothesis that combining contextual information from a language model with local domain-specific features can improve the classifier’s performance. We do so by evaluating: (1) two global classifiers (support vector machine and logistic regression); (2) contextual information from language models; and (3) the ensembling results.

pdf bib abs

VEL@LT-EDI-2023: Automatic Detection of Hope Speech in Bulgarian Language using Embedding Techniques
Rahul Ponnusamy | Malliga Subramaniam | Sajeetha Thavareesan | Ruba Priyadharshini

Many people may find motivation in their lives by spreading content on social media that is encouraging or hopeful. Creating an effective model that helps in accurately predicting the target class is a challenging task. The problem of Hope speech identification is dealt with in this work using machine learning and deep learning methods. This paper presents the description of the system submitted by our team(VEL) to the Hope Speech Detection for Equality, Diversity, and Inclusion(HSD-EDI) LT-EDI-RANLP 2023 shared task for the Bulgarian language. The main goal of this shared task is to identify the given text into the Hope speech or Non-Hope speech category. The proposed method used the H2O deep learning model with MPNet embeddings and achieved the second rank for the Bulgarian language with the Macro F1 score of 0.69.

pdf bib abs

Cordyceps@LT-EDI: Patching Language-Specific Homophobia/Transphobia Classifiers with a Multilingual Understanding
Dean Ninalga

Detecting transphobia, homophobia, and various other forms of hate speech is difficult. Signals can vary depending on factors such as language, culture, geographical region, and the particular online platform. Here, we present a joint multilingual (M-L) and language-specific (L-S) approach to homophobia and transphobic hate speech detection (HSD). M-L models are needed to catch words, phrases, and concepts that are less common or missing in a particular language and subsequently overlooked by L-S models. Nonetheless, L-S models are better situated to understand the cultural and linguistic context of the users who typically write in a particular language. Here we construct a simple and successful way to merge the M-L and L-S approaches through simple weight interpolation in such a way that is interpretable and data-driven. We demonstrate our system on task A of the “Shared Task on Homophobia/Transphobia Detection in social media comments” dataset for homophobia and transphobic HSD. Our system achieves the best results in three of five languages and achieves a 0.997 macro average F1-score on Malayalam texts.

pdf bib abs

Cordyceps@LT-EDI : Depression Detection with Reddit and Self-training
Dean Ninalga

Depression is debilitating, and not uncommon. Indeed, studies of excessive social media users show correlations with depression, ADHD, and other mental health concerns. Given that there is a large number of people with excessive social media usage, then there is a significant population of potentially undiagnosed users and posts that they create. In this paper, we propose a depression detection system using a semi-supervised learning technique. Namely, we use a trained model to classify a large number of unlabelled social media posts from Reddit, then use these generated labels to train a more powerful classifier. We demonstrate our framework on Detecting Signs of Depression from Social Media Text - LT-EDI@RANLP 2023 shared task, where our framework ranks 3rd overall.

pdf bib abs

TechWhiz@LT-EDI-2023: Transformer Models to Detect Levels of Depression from Social Media Text
Madhumitha M | Jerin Mahibha C | Thenmozhi D.

Depression is a mental fitness disorder from persistent reactions of unhappiness, void, and a deficit of interest in activities. It can influence differing facets of one’s life, containing their hopes, sympathy, and nature. Depression can stem from a sort of determinant, in the way that ancestral willingness, life occurrences, and social circumstances. In current years, the influence of social media on mental fitness has become an increasing concern. Excessive use of social media and the negative facets that guide it, can exacerbate or cause impressions of distress. The nonstop exposure to cautiously curated lives, social comparison, cyberbullying, and the pressure to meet unreal standards can impact an individual’s pride, social connections, and overall well-being. We participated in the shared task at DepSignLT-EDI@RANLP 2023 and have proposed a model that identifies the levels of depression from social media text using the data set shared for the task. Different transformer models like ALBERT and RoBERTa are used by the proposed model for implementing the task. The macro F1 score obtained by ALBERT model and RoBERTa model are 0.258 and 0.143 respectively.

pdf bib abs

CSE_SPEECH@LT-EDI-2023Automatic Speech Recognition vulnerable old-aged and transgender people in Tamil
Varsha Balaji | Archana Jp | Bharathi B

This paper centers on utilizing Automatic Speech Recognition (ASR) for defenseless old-aged and transgender people in Tamil. The Amrrs/wav2vec2-large-xlsr-53-tamil show accomplishes a Word Error Rate (WER) of 40%. By leveraging this demonstration, ASR innovation upgrades availability and inclusivity, helping those with discourse impedances, hearing impedances, and cognitive inabilities. Assist refinements are vital to diminish error and move forward the client involvement. This inquiry emphasizes the significance of ASR, particularly the Amrrs/wav2vec2-large-xlsr-53-tamil show, in encouraging successful communication and availability for defenseless populaces in Tamil.

pdf bib abs

VTUBGM@LT-EDI-2023: Hope Speech Identification using Layered Differential Training of ULMFit
Sanjana M. Kavatagi | Rashmi R. Rachh | Shankar S. Biradar

Hope speech embodies optimistic and uplifting sentiments, aiming to inspire individuals to maintain faith in positive progress and actively contribute to a better future. In this article, we outline the model presented by our team, VTUBGM, for the shared task “Hope Speech Detection for Equality, Diversity, and Inclusion” at LT-EDI-RANLP 2023. This task entails classifying YouTube comments, which is a classification problem at the comment level. The task was conducted in four different languages: Bulgarian, English, Hindi, and Spanish. VTUBGM submitted a model developed through layered differential training of the ULMFit model. As a result, a macro F1 score of 0.48 was obtained and ranked 3rd in the competition.

pdf bib abs

ML&AI_IIITRanchi@LT-EDI-2023: Identification of Hope Speech of YouTube comments in Mixed Languages
Kirti Kumari | Shirish Shekhar Jha | Zarikunte Kunal Dayanand | Praneesh Sharma

Hope speech analysis refers to the examination and evaluation of speeches or messages that aim to instill hope, inspire optimism, and motivate individuals or communities. It involves analyzing the content, language, rhetorical devices, and delivery techniques used in a speech to understand how it conveys hope and its potential impact on the audience. The objective of this study is to classify the given text comments as Hope Speech or Not Hope Speech. The provided dataset consists of YouTube comments in four languages: English, Hindi, Spanish, Bulgarian; with pre-defined classifications. Our approach involved pre-processing the dataset and using the TF-IDF (Term Frequency-Inverse Document Frequency) method.

pdf bib abs

ML&AI_IIITRanchi@LT-EDI-2023: Hybrid Model for Text Classification for Identification of Various Types of Depression
Kirti Kumari | Shirish Shekhar Jha | Zarikunte Kunal Dayanand | Praneesh Sharma

DepSign–LT–EDI@RANLP–2023 is a dedicated task that addresses the crucial issue of identifying indications of depression in individuals through their social media posts, which serve as a platform for expressing their emotions and sentiments. The primary objective revolves around accurately classifying the signs of depression into three distinct categories: “not depressed,” “moderately depressed,” and “severely depressed.” Our study entailed the utilization of machine learning algorithms, coupled with a diverse range of features such as sentence embeddings, TF-IDF, and Bag-of- Words. Remarkably, the adoption of hybrid models yielded promising outcomes, culminating in a 10^th rank achievement, supported by macro F1-Score of 0.408. This research underscores the effectiveness and potential of employing advanced text classification methodologies to discern and identify signs of depression within social media data. The findings hold implications for the development of mental health monitoring systems and support mechanisms, contributing to the well-being of individuals in need.

pdf bib abs

VEL@LT-EDI: Detecting Homophobia and Transphobia in Code-Mixed Spanish Social Media Comments
Prasanna Kumar Kumaresan | Kishore Kumar Ponnusamy | Kogilavani Shanmugavadivel | Subalalitha Chinnaudayar Navaneethakrishnan | Ruba Priyadharshini | Bharathi Raja Chakravarthi

Our research aims to address the task of detecting homophobia and transphobia in social media code-mixed comments written in Spanish. Code-mixed text in social media often violates strict grammar rules and incorporates non-native scripts, posing challenges for identification. To tackle this problem, we perform pre-processing by removing unnecessary content and establishing a baseline for detecting homophobia and transphobia. Furthermore, we explore the effectiveness of various traditional machine-learning models with feature extraction and pre-trained transformer model techniques. Our best configurations achieve macro F1 scores of 0.84 on the test set and 0.82 on the development set for Spanish, demonstrating promising results in detecting instances of homophobia and transphobia in code-mixed comments.

pdf bib abs

TechSSN4@LT-EDI-2023: Depression Sign Detection in Social Media Postings using DistilBERT Model
Krupa Elizabeth Thannickal | Sanmati P | Rajalakshmi Sivanaiah | Angel Deborah S

As world population increases, more people are living to the age when depression or Major Depressive Disorder (MDD) commonly occurs. Consequently, the number of those who suffer from such disorders is rising. There is a pressing need for faster and reliable diagnosis methods. This paper proposes the method to analyse text input from social media posts of subjects to determine the severity class of depression. We have used the DistilBERT transformer to process these texts and classify the individuals across three severity labels - ‘not depression’, ‘moderate’ and ‘severe’. The results showed the macro F1-score of 0.437 when the model was trained for 5 epochs with a comparative performance across the labels.The team acquired 6th rank while the top team scored macro F1-score as 0.470. We hope that this system will support further research into the early identification of depression in individuals to promote effective medical research and related treatments.

pdf bib abs

The Mavericks@LT-EDI-2023: Detection of signs of Depression from social Media Texts using Navie Bayse approach
Sathvika V S | Vaishnavi Vaishnavi S | Angel Deborah S | Rajalakshmi Sivanaiah | Mirnalinee ThankaNadar

Social media platforms have revolutionized the landscape of communication, providing individuals with an outlet to express their thoughts, emotions, and experiences openly. This paper focuses on the development of a model to determine whether individuals exhibit signs of depression based on their social media texts. With the aim of optimizing performance and accuracy, a Naive Bayes approach was chosen for the detection task.The Naive Bayes algorithm, a probabilistic classifier, was applied to extract features and classify the texts. The model leveraged linguistic patterns, sentiment analysis, and other relevant features to capture indicators of depression within the texts. Preprocessing techniques, including tokenization, stemming, and stop-word removal, were employed to enhance the quality of the input data.The performance of the Naive Bayes model was evaluated using standard metrics such as accuracy, precision, recall, and F1-score, it acheived a macro- avergaed F1 score of 0.263.

pdf bib abs

hate-alert@LT-EDI-2023: Hope Speech Detection Using Transformer-Based Models
Mithun Das | Shubhankar Barman | Subhadeep Chatterjee

Social media platforms have become integral to our daily lives, facilitating instant sharing of thoughts and ideas. While these platforms often host inspiring, motivational, and positive content, the research community has recognized the significance of such messages by labeling them as “hope speech”. In light of this, we delve into the detection of hope speech on social media platforms. Specifically, we explore various transformer-based model setups for the LT-EDI shared task at RANLP 2023. We observe that the performance of the models varies across languages. Overall, the finetuned m-BERT model showcases the best performance among all the models across languages. Our models secured the first position in Bulgarian and Hindi languages and achieved the third position for the Spanish language in the respective task.

pdf bib abs

Hope is a cheerful and optimistic state of mind which has its basis in the expectation of positive outcomes. Hope speech reflects the same as they are positive words that can motivate and encourage a person to do better. Non-hope speech reflects the exact opposite. They are meant to ridicule or put down someone and affect the person negatively. The shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI - RANLP 2023 was created with data sets in English, Spanish, Bulgarian and Hindi. The purpose of this task is to classify human-generated comments on the platform, YouTube, as Hope speech or non-Hope speech. We employed multiple traditional models such as SVM (support vector machine), Random Forest classifier, Naive Bayes and Logistic Regression. Support Vector Machine gave the highest macro average F1 score of 0.49 for the training data set and a macro average F1 score of 0.50 for the test data set.

pdf bib abs

Interns@LT-EDI : Detecting Signs of Depression from Social Media Text
Koushik L | Hariharan R. L | Anand Kumar M

This submission presents our approach for depression detection in social media text. The methodology includes data collection, preprocessing - SMOTE, feature extraction/selection - TF-IDF and Glove, model development- SVM, CNN and Bi-LSTM, training, evaluation, optimisation, and validation. The proposed methodology aims to contribute to the accurate detection of depression.

pdf bib abs

The advent of social media platforms has revo- lutionized the way we interact, share, learn , ex- press and build our views and ideas. One major challenge of social media is hate speech. Homo- phobia and transphobia encompasses a range of negative attitudes and feelings towards people based on their sexual orientation or gender iden- tity. Homophobia refers to the fear, hatred, or prejudice against homosexuality, while trans- phobia involves discrimination against trans- gender individuals. Natural Language Process- ing can be used to identify homophobic and transphobic texts and help make social media a safer place. In this paper, we explore us- ing Support Vector Machine , Random Forest Classifier and Bert Model for homophobia and transphobia detection. The best model was a combination of LaBSE and SVM that achieved a weighted F1 score of 0.95.

pdf bib abs

DeepLearningBrasil@LT-EDI-2023: Exploring Deep Learning Techniques for Detecting Depression in Social Media Text
Eduardo Garcia | Juliana Gomes | Adalberto Ferreira Barbosa Junior | Cardeque Henrique Bittes de Alvarenga Borges | Nadia Félix Felipe da Silva

In this paper, we delineate the strategy employed by our team, DeepLearningBrasil, which secured us the first place in the shared task DepSign-LT-EDI@RANLP-2023 with the advantage of 2.4%. The task was to classify social media texts into three distinct levels of depression - “not depressed,” “moderately depressed,” and “severely depressed.” Leveraging the power of the RoBERTa and DeBERTa models, we further pre-trained them on a collected Reddit dataset, specifically curated from mental health-related Reddit’s communities (Subreddits), leading to an enhanced understanding of nuanced mental health discourse. To address lengthy textual data, we introduced truncation techniques that retained the essence of the content by focusing on its beginnings and endings. Our model was robust against unbalanced data by incorporating sample weights into the loss. Cross-validation and ensemble techniques were then employed to combine our k-fold trained models, delivering an optimal solution. The accompanying code is made available for transparency and further development.

pdf bib abs

MUCS@LT-EDI2023: Learning Approaches for Hope Speech Detection in Social Media Text
Asha Hegde | Kavya G | Sharal Coelho | Hosahalli Lakshmaiah Shashirekha

Hope plays a significant role in shaping human thoughts and actions and hope content has received limited attention in the realm of social media data analysis. The exploration of hope content helps to uncover the valuable insights into users’ aspirations, expectations, and emotional states. By delving into the analysis of hope content on social media platforms, researchers and analysts can gain a deeper understanding of how hope influences individuals’ behaviors, decisions, and overall well-being in the digital age. However, this area is rarely explored even for resource-high languages. To address the identification of hope text in social media platforms, this paper describes the models submitted by the team MUCS to “Hope Speech Detection for Equality, Diversity, and Inclusion (LT-EDI)” shared task organized at Recent Advances in Natural Language Processing (RANLP) - 2023. This shared task aims to classify a comment/post in English and code-mixed texts in three languages, namely, Bulgarian, Spanish, and Hindi into one of the two predefined categories, namely, “Hope speech” and “Non Hope speech”. Two models, namely: i) Hope_BERT - Linear Support Vector Classifier (LinearSVC) model trained by combining Bidirectional Encoder Representations from Transformers (BERT) embeddings and Term Frequency-Inverse Document Frequency (TF-IDF) of character n-grams with word boundary (char_wb) for English and ii) Hope_mBERT - LinearSVC model trained by combining Multilingual BERT (mBERT) embeddings and TF-IDF of char_wb for Bulgarian, Spanish, and Hindi code-mixed texts are proposed for the shared task to classify the given text into Hope or Non-Hope categories. The proposed models obtained 1st, 1st, 2nd, and 5th ranks for Spanish, Bulgarian, Hindi, and English texts respectively.

pdf bib abs

MUCS@LT-EDI2023: Homophobic/Transphobic Content Detection in Social Media Text using mBERT
Asha Hegde | Kavya G | Sharal Coelho | Hosahalli Lakshmaiah Shashirekha

Homophobic/Transphobic (H/T) content includes hate speech, discrimination text, and abusive comments against Gay, Lesbian, Bisexual, Transgender, Queer, and Intersex (LGBTQ) individuals. With the increase in user generated text in social media, there has been an increase in code-mixed H/T content, which poses challenges for efficient analysis and detection of H/T content on social media. The complex nature of code-mixed text necessitates the development of advanced tools and techniques to effectively tackle this issue in social media platforms. To tackle this issue, in this paper, we - team MUCS, describe the transformer based models submitted to “Homophobia/Transphobia Detection in social media comments” shared task in Language Technology for Equality, Diversity and Inclusion (LT-EDI) at Recent Advances in Natural Language Processing (RANLP)-2023. The proposed methodology makes use of resampling the training data to handle the data imbalance and this resampled data is used to fine-tune the Multilingual Bidirectional Encoder Representations from Transformers (mBERT) models. These models obtained 11th, 5th, 3rd, 3rd, and 7th ranks for English, Tamil, Malayalam, Spanish, and Hindi respectively in Task A and 8th, 2nd, and 2nd ranks for English, Tamil, and Malayalam respectively in Task B.

pdf bib abs

MUCS@LT-EDI2023: Detecting Signs of Depression in Social Media Text
Sharal Coelho | Asha Hegde | Kavya G | Hosahalli Lakshmaiah Shashirekha

Depression can lead to significant changes in individuals’ posts on social media which is a important task to identify. Automated techniques must be created for the identification task as manually analyzing the growing volume of social media data is time-consuming. To address the signs of depression posts on social media, in this paper, we - team MUCS, describe a Transfer Learning (TL) model and Machine Learning (ML) models submitted to “Detecting Signs of Depression from Social Media Text” shared task organised by DepSign-LT-EDI@RANLP-2023. The TL model is trained using raw text Bidirectional Encoder Representations from Transformers (BERT) and the ML model is trained using Term Frequency-Inverse Document Frequency (TF-IDF) features separately. Among these three models, the TL model performed better with a macro averaged F1-score of 0.361 and placed 20th rank in the shared task.

pdf bib abs

The goal of this study is to use machine learning approaches to detect depression indications in social media articles. Data gathering, pre-processing, feature extraction, model training, and performance evaluation are all aspects of the research. The collection consists of social media messages classified into three categories: not depressed, somewhat depressed, and severely depressed. The study contributes to the growing field of social media data-driven mental health analysis by stressing the use of feature extraction algorithms for obtaining relevant information from text data. The use of social media communications to detect depression has the potential to increase early intervention and help for people at risk. Several feature extraction approaches, such as TF-IDF, Count Vectorizer, and Hashing Vectorizer, are used to quantitatively represent textual data. These features are used to train and evaluate a wide range of machine learning models, including Logistic Regression, Random Forest, Decision Tree, Gaussian Naive Bayes, and Multinomial Naive Bayes. To assess the performance of the models, metrics such as accuracy, precision, recall, F1 score, and the confusion matrix are utilized. The Random Forest model with Count Vectorizer had the greatest accuracy on the development dataset, coming in at 92.99 percent. And with a macro F1-score of 0.362, we came in 19th position in the shared task. The findings show that machine learning is effective in detecting depression markers in social media articles.

pdf bib abs

Flamingos_python@LT-EDI-2023: An Ensemble Model to Detect Severity of Depression
Abirami P S | Amritha S | Pavithra Meganathan | Jerin Mahibha C

The prevalence of depression is increasing globally, and there is a need for effective screening and detection tools. Social media platforms offer a rich source of data for mental health research. The paper aims to detect the signs of depression of a person from their social media postings wherein people share their feelings and emotions. The task is to create a system that, given social media posts in English, should classify the level of depression as ‘not depressed’, ‘moderately depressed’ or ‘severely depressed’. The paper presents the solution for the Shared Task on Detecting Signs of Depression from Social Media Text at LT-EDI@RANLP 2023. The proposed system aims to develop a machine learning model using machine learning algorithms like SVM, Random forest and Naive Bayes to detect signs of depression from social media text. The model is trained on a dataset of social media posts to detect the level of depression of the individuals as ‘not depressed’, ‘moderately depressed’ or ‘severely depressed’. The dataset is pre-processed to remove duplicates and irrelevant features, and then, feature engineering techniques is used to extract meaningful features from the text data. The model is trained on these features to classify the text into the three categories. The performance of the model is evaluated using metrics such as accuracy, precision, recall, and F1-score. The ensemble model is used to combine these algorithms which gives accuracy of 90.2% and the F1 score is 0.90. The results of the proposed approach could potentially aid in the early detection and prevention of depression for individuals who may be at risk.