Asha Hegde


2022

pdf bib
MUCS@Text-LT-EDI@ACL 2022: Detecting Sign of Depression from Social Media Text using Supervised Learning Approach
Asha Hegde | Sharal Coelho | Ahmad Elyas Dashti | Hosahalli Shashirekha
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Social media has seen enormous growth in its users recently and knowingly or unknowingly the behavior of a person will be reflected in the comments she/he posts on social media. Users having the sign of depression may post negative or disturbing content seeking the attention of other users. Hence, social media data can be analysed to check whether the users’ have the sign of depression and help them to get through the situation if required. However, as analyzing the increasing amount of social media data manually in laborious and error-prone, automated tools have to be developed for the same. To address the issue of detecting the sign of depression content on social media, in this paper, we - team MUCS, describe an Ensemble of Machine Learning (ML) models and a Transfer Learning (TL) model submitted to “Detecting Signs of Depression from Social Media Text-LT-EDI@ACL 2022” (DepSign-LT-EDI@ACL-2022) shared task at Association for Computational Linguistics (ACL) 2022. Both frequency and text based features are used to train an Ensemble model and Bidirectional Encoder Representations from Transformers (BERT) fine-tuned with raw text is used to train the TL model. Among the two models, the TL model performed better with a macro averaged F-score of 0.479 and placed 18th rank in the shared task. The code to reproduce the proposed models is available in github page1.

pdf bib
MUCS@DravidianLangTech@ACL2022: Ensemble of Logistic Regression Penalties to Identify Emotions in Tamil Text
Asha Hegde | Sharal Coelho | Hosahalli Shashirekha
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

Emotion Analysis (EA) is the process of automatically analyzing and categorizing the input text into one of the predefined sets of emotions. In recent years, people have turned to social media to express their emotions, opinions or feelings about news, movies, products, services, and so on. These users’ emotions may help the public, governments, business organizations, film producers, and others in devising strategies, making decisions, and so on. The increasing number of social media users and the increasing amount of user generated text containing emotions on social media demands automated tools for the analysis of such data as handling this data manually is labor intensive and error prone. Further, the characteristics of social media data makes the EA challenging. Most of the EA research works have focused on English language leaving several Indian languages including Tamil unexplored for this task. To address the challenges of EA in Tamil texts, in this paper, we - team MUCS, describe the model submitted to the shared task on Emotion Analysis in Tamil at DravidianLangTech@ACL 2022. Out of the two subtasks in this shared task, our team submitted the model only for Task a. The proposed model comprises of an Ensemble of Logistic Regression (LR) classifiers with three penalties, namely: L1, L2, and Elasticnet. This Ensemble model trained with Term Frequency - Inverse Document Frequency (TF-IDF) of character bigrams and trigrams secured 4th rank in Task a with a macro averaged F1-score of 0.04. The code to reproduce the proposed models is available in github1.

pdf bib
Overview of the Shared Task on Machine Translation in Dravidian Languages
Anand Kumar Madasamy | Asha Hegde | Shubhanker Banerjee | Bharathi Raja Chakravarthi | Ruba Priyadharshini | Hosahalli Shashirekha | John McCrae
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

This paper presents an outline of the shared task on translation of under-resourced Dravidian languages at DravidianLangTech-2022 workshop to be held jointly with ACL 2022. A description of the datasets used, approach taken for analysis of submissions and the results have been illustrated in this paper. Five sub-tasks organized as a part of the shared task include the following translation pairs: Kannada to Tamil, Kannada to Telugu, Kannada to Sanskrit, Kannada to Malayalam and Kannada to Tulu. Training, development and test datasets were provided to all participants and results were evaluated on the gold standard datasets. A total of 16 research groups participated in the shared task and a total of 12 submission runs were made for evaluation. Bilingual Evaluation Understudy (BLEU) score was used for evaluation of the translations.

2021

pdf bib
MUCS@ - Machine Translation for Dravidian Languages using Stacked Long Short Term Memory
Asha Hegde | Ibrahim Gashaw | Shashirekha H.l.
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Dravidian language family is one of the largest language families in the world. In spite of its uniqueness, Dravidian languages have gained very less attention due to scarcity of resources to conduct language technology tasks such as translation, Parts-of-Speech tagging, Word Sense Disambiguation etc,. In this paper, we, team MUCS, describe sequence-to-sequence stacked Long Short Term Memory (LSTM) based Neural Machine Translation (NMT) model submitted to “Machine Translation in Dravidian languages”, a shared task organized by EACL-2021. The NMT model was applied on translation using English-Tamil, EnglishTelugu, English-Malayalam and Tamil-Telugu corpora provided by the organizers. Standard evaluation metrics namely Bilingual Evaluation Understudy (BLEU) and human evaluations are used to evaluate the model. Our models exhibited good accuracy for all the language pairs and obtained 2nd rank for TamilTelugu language pair.

2020

pdf bib
MUCS@Adap-MT 2020: Low Resource Domain Adaptation for Indic Machine Translation
Asha Hegde | H.l. Shashirekha
Proceedings of the 17th International Conference on Natural Language Processing (ICON): Adap-MT 2020 Shared Task

Machine Translation (MT) is the task of automatically converting the text in source language to text in target language by preserving the meaning. MT usually require large corpus for training the translation models. Due to scarcity of resources very less attention is given to translating into low resource languages and in particular into Indic languages. In this direction, a shared task called “Adap-MT 2020: Low Resource Domain Adaptation for Indic Machine Translation” is organized to illustrate the capability of general domain MT when translating into Indic languages and low resource domain adaptation of MT systems. In this paper, we, team MUCS, describe a simple word extraction based domain adaptation approach applied to English-Hindi MT only. MT in the proposed model is carried out using Open-NMT - a popular Neural Machine Translation tool. A general domain corpus is built effectively combining the available English-Hindi corpora and removing the duplicate sentences. Further, domain specific corpus is updated by extracting the sentences from generic corpus that contains the words given in the domain specific corpus. The proposed model exhibited satisfactory results for small domain specific AI and CHE corpora provided by the organizers in terms of BLEU score with 1.25 and 2.72 respectively. Further, this methodology is quite generic and can easily be extended to other low resource language pairs as well.