Sunil Saumya


2022

pdf bib
CURAJ_IIITDWD@LT-EDI-ACL 2022: Hope Speech Detection in English YouTube Comments using Deep Learning Techniques
Vanshita Jha | Ankit Mishra | Sunil Saumya
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Hope Speech are positive terms that help to promote or criticise a point of view without hurting the user’s or community’s feelings. Non-Hope Speech, on the other side, includes expressions that are harsh, ridiculing, or demotivating. The goal of this article is to find the hope speech comments in a YouTube dataset. The datasets were created as part of the “LT-EDI-ACL 2022: Hope Speech Detection for Equality, Diversity, and Inclusion” shared task. The shared task dataset was proposed in Malayalam, Tamil, English, Spanish, and Kannada languages. In this paper, we worked at English-language YouTube comments. We employed several deep learning based models such as DNN (dense or fully connected neural network), CNN (Convolutional Neural Network), Bi-LSTM (Bidirectional Long Short Term Memory Network), and GRU(Gated Recurrent Unit) to identify the hopeful comments. We also used Stacked LSTM-CNN and Stacked LSTM-LSTM network to train the model. The best macro average F1-score 0.67 for development dataset was obtained using the DNN model. The macro average F1-score of 0.67 was achieved for the classification done on the test data as well.

pdf bib
SOA_NLP@LT-EDI-ACL2022: An Ensemble Model for Hope Speech Detection from YouTube Comments
Abhinav Kumar | Sunil Saumya | Pradeep Roy
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Language should be accommodating of equality and diversity as a fundamental aspect of communication. The language of internet users has a big impact on peer users all over the world. On virtual platforms such as Facebook, Twitter, and YouTube, people express their opinions in different languages. People respect others’ accomplishments, pray for their well-being, and cheer them on when they fail. Such motivational remarks are hope speech remarks. Simultaneously, a group of users encourages discrimination against women, people of color, people with disabilities, and other minorities based on gender, race, sexual orientation, and other factors. To recognize hope speech from YouTube comments, the current study offers an ensemble approach that combines a support vector machine, logistic regression, and random forest classifiers. Extensive testing was carried out to discover the best features for the aforementioned classifiers. In the support vector machine and logistic regression classifiers, char-level TF-IDF features were used, whereas in the random forest classifier, word-level features were used. The proposed ensemble model performed significantly well among English, Spanish, Tamil, Malayalam, and Kannada YouTube comments.

pdf bib
IIITDWD@TamilNLP-ACL2022: Transformer-based approach to classify abusive content in Dravidian Code-mixed text
Shankar Biradar | Sunil Saumya
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

Identifying abusive content or hate speech in social media text has raised the research community’s interest in recent times. The major driving force behind this is the widespread use of social media websites. Further, it also leads to identifying abusive content in low-resource regional languages, which is an important research problem in computational linguistics. As part of ACL-2022, organizers of DravidianLangTech@ACL 2022 have released a shared task on abusive category identification in Tamil and Tamil-English code-mixed text to encourage further research on offensive content identification in low-resource Indic languages. This paper presents the working notes for the model submitted by IIITDWD at DravidianLangTech@ACL 2022. Our team competed in Sub-Task B and finished in 9th place among the participating teams. In our proposed approach, we used a pre-trained transformer model such as Indic-bert for feature extraction, and on top of that, SVM classifier is used for stance detection. Further, our model achieved 62 % accuracy on code-mixed Tamil-English text.

pdf bib
Are you a hero or a villain? A semantic role labelling approach for detecting harmful memes.
Shaik Fharook | Syed Sufyan Ahmed | Gurram Rithika | Sumith Sai Budde | Sunil Saumya | Shankar Biradar
Proceedings of the Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situations

Identifying good and evil through representations of victimhood, heroism, and villainy (i.e., role labeling of entities) has recently caught the research community’s interest. Because of the growing popularity of memes, the amount of offensive information published on the internet is expanding at an alarming rate. It generated a larger need to address this issue and analyze the memes for content moderation. Framing is used to show the entities engaged as heroes, villains, victims, or others so that readers may better anticipate and understand their attitudes and behaviors as characters. Positive phrases are used to characterize heroes, whereas negative terms depict victims and villains, and terms that tend to be neutral are mapped to others. In this paper, we propose two approaches to role label the entities of the meme as hero, villain, victim, or other through Named-Entity Recognition(NER), Sentiment Analysis, etc. With an F1-score of 23.855, our team secured eighth position in the Shared Task @ Constraint 2022.

2021

pdf bib
IIIT_DWD@LT-EDI-EACL2021: Hope Speech Detection in YouTube multilingual comments
Sunil Saumya | Ankit Kumar Mishra
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

Language as a significant part of communication should be inclusive of equality and diversity. The internet user’s language has a huge influence on peer users all over the world. People express their views through language on virtual platforms like Facebook, Twitter, YouTube etc. People admire the success of others, pray for their well-being, and encourage on their failure. Such inspirational comments are hope speech comments. At the same time, a group of users promotes discrimination based on gender, racial, sexual orientation, persons with disability, and other minorities. The current paper aims to identify hope speech comments which are very important to move on in life. Various machine learning and deep learning based models (such as support vector machine, logistics regression, convolutional neural network, recurrent neural network) are employed to identify the hope speech in the given YouTube comments. The YouTube comments are available in English, Tamil and Malayalam languages and are part of the task “EACL-2021:Hope Speech Detection for Equality, Diversity and Inclusion”.

pdf bib
Offensive language identification in Dravidian code mixed social media text
Sunil Saumya | Abhinav Kumar | Jyoti Prakash Singh
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Hate speech and offensive language recognition in social media platforms have been an active field of research over recent years. In non-native English spoken countries, social media texts are mostly in code mixed or script mixed/switched form. The current study presents extensive experiments using multiple machine learning, deep learning, and transfer learning models to detect offensive content on Twitter. The data set used for this study are in Tanglish (Tamil and English), Manglish (Malayalam and English) code-mixed, and Malayalam script-mixed. The experimental results showed that 1 to 6-gram character TF-IDF features are better for the said task. The best performing models were naive bayes, logistic regression, and vanilla neural network for the dataset Tamil code-mix, Malayalam code-mixed, and Malayalam script-mixed, respectively instead of more popular transfer learning models such as BERT and ULMFiT and hybrid deep models.

pdf bib
IIIT_DWD@EACL2021: Identifying Troll Meme in Tamil using a hybrid deep learning approach
Ankit Kumar Mishra | Sunil Saumya
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Social media are an open forum that allows people to share their knowledge, abilities, talents, ideas, or expressions. Simultaneously, it also allows people to post disrespectful, trolling, defamation, or negative content targeting users or the community based on their gender, race, religious beliefs, etc. Such posts are available in the form of text, image, video, and meme. Among them, memes are currently widely used to disseminate offensive material amongst people. It is primarily in the form of pictures and text. In the present paper, troll memes are identified, which is necessary to create a healthy society. To do so, a hybrid deep learning model combining convolutional neural networks and bidirectional long short term memory is proposed to identify trolled memes. The dataset used in the study is a part of the competition EACL 2021: Troll Meme classification in Tamil. The proposed model obtained 10th rank in the competition and reported a precision of 0.52, recall 0.59, and weighted F10.3.