Abhishek R
2024
Wit Hub@DravidianLangTech-2024:Multimodal Social Media Data Analysis in Dravidian Languages using Machine Learning Models
Anierudh S
|
Abhishek R
|
Ashwin Sundar
|
Amrit Krishnan
|
Bharathi B
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The main objective of the task is categorised into three subtasks. Subtask-1 Build models to determine the sentiment expressed in multimodal posts (or videos) in Tamil and Malayalam languages, leveraging textual, audio, and visual components. The videos are labelled into five categories: highly positive, positive, neutral, negative and highly negative. Subtask-2 Design machine models that effectively identify and classify abusive language within the multimodal context of social media posts in Tamil. The data are categorized into abusive and non-abusive categories. Subtask-3 Develop advanced models that accurately detect and categorize hate speech and offensive language in multimodal social media posts in Dravidian languages. The data points are categorized into Caste, Offensive, Racist and Sexist classes. In this session, the focus is primarily on Tamil language text data analysis. Various combination of machine learning models have been used to perform each tasks and do oversampling techniques to train models on biased dataset.