Jayant Panwar
2023
PanwarJayant at SemEval-2023 Task 10: Exploring the Effectiveness of Conventional Machine Learning Techniques for Online Sexism Detection
Jayant Panwar
|
Radhika Mamidi
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
The rapid growth of online communication using social media platforms has led to an increase in the presence of hate speech, especially in terms of sexist language online. The proliferation of such hate speech has a significant impact on the mental health and well-being of the users and hence the need for automated systems to detect and filter such texts. In this study, we explore the effectiveness of conventional machine learning techniques for detecting sexist text. We explore five conventional classifiers, namely, Logistic Regression, Decision Tree, XGBoost, Support Vector Machines, and Random Forest. The results show that different classifiers perform differently on each task due to their different inherent architectures which may be suited to a certain problem more. These models are trained on the shared task dataset, which includes both sexist and non-sexist texts. All in all, this study explores the potential of conventional machine learning techniques in detecting online sexist content. The results of this study highlight the strengths and weaknesses of all classifiers with respect to all subtasks. The results of this study will be useful for researchers and practitioners interested in developing systems for detecting or filtering online hate speech.
DAP-LeR-DAug: Techniques for enhanced Online Sexism Detection
Jayant Panwar
|
Radhika Mamidi
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)
Text-2-Wiki: Summarization and Template-driven Article Generation
Jayant Panwar
|
Radhika Mamidi
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Users on Wikipedia collaborate in a structured and organized manner to publish and update articles on numerous topics, which makes Wikipedia a very rich source of knowledge. English Wikipedia has the most amount of information available (more than 6.7 million articles); however, there are few good informative articles on Wikipedia in Indian languages. Hindi Wikipedia has approximately only 160k articles. The same article in Hindi can be vastly different from its English version and generally contains less information. This poses a problem for native Indian language speakers who are not proficient in English. Therefore, having the same amount of information in Indian Languages will help promote knowledge among those who are not well-versed in English. Publishing the articles manually, like the usual process in Global English Wikipedia, is a timeconsuming process. To get the amount of information in native Indian languages up-to-speed with the amount of information in English, automating the whole article generation process is the best option. In this study, we present a stage-wise approach ranging from Data Collection to Summarization and Translation, and finally ending with Template Creation. This approach ensures the efficient generation of a large amount of content in Hindi Wikipedia in less time. With the help of this study, we were able to successfully generate more than a thousand articles in Hindi Wikipedia with ease.
Search