Dhanya Srinivasan
2022
SSNCSE_NLP@LT-EDI-ACL2022:Hope Speech Detection for Equality, Diversity and Inclusion using sentence transformers
Bharathi B
|
Dhanya Srinivasan
|
Josephine Varsha
|
Thenmozhi Durairaj
|
Senthil Kumar B
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
In recent times, applications have been developed to regulate and control the spread of negativity and toxicity on online platforms. The world is filled with serious problems like political & religious conflicts, wars, pandemics, and offensive hate speech is the last thing we desire. Our task was to classify a text into ‘Hope Speech’ and ‘Non-Hope Speech’. We searched for datasets acquired from YouTube comments that offer support, reassurance, inspiration, and insight, and the ones that don’t. The datasets were provided to us by the LTEDI organizers in English, Tamil, Spanish, Kannada, and Malayalam. To successfully identify and classify them, we employed several machine learning transformer models such as m-BERT, MLNet, BERT, XLMRoberta, and XLM_MLM. The observed results indicate that the BERT and m-BERT have obtained the best results among all the other techniques, gaining a weighted F1- score of 0.92, 0.71, 0.76, 0.87, and 0.83 for English, Tamil, Spanish, Kannada, and Malayalam respectively. This paper depicts our work for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LTEDI 2021.
SSNCSE_NLP@LT-EDI-ACL2022: Speech Recognition for Vulnerable Individuals in Tamil using pre-trained XLSR models
Dhanya Srinivasan
|
Bharathi B
|
Thenmozhi Durairaj
|
Senthil Kumar B
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Automatic speech recognition is a tool used to transform human speech into a written form. It is used in a variety of avenues, such as in voice commands, customer, service and more. It has emerged as an essential tool in the digitisation of daily life. It has been known to be of vital importance in making the lives of elderly and disabled people much easier. In this paper we describe an automatic speech recognition model, determined by using three pre-trained models, fine-tuned from the Facebook XLSR Wav2Vec2 model, which was trained using the Common Voice Dataset. The best model for speech recognition in Tamil is determined by finding the word error rate of the data. This work explains the submission made by SSNCSE_NLP in the shared task organized by LT-EDI at ACL 2022. A word error rate of 39.4512 is achieved.
Search