Monil Gokani
2023
Witcherses at SemEval-2023 Task 12: Ensemble Learning for African Sentiment Analysis
Monil Gokani
|
K V Aditya Srivatsa
|
Radhika Mamidi
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper describes our system submission for SemEval-2023 Task 12 AfriSenti-SemEval: Sentiment Analysis for African Languages. We propose an XGBoost-based ensemble model trained on emoticon frequency-based features and the predictions of several statistical models such as SVMs, Logistic Regression, Random Forests, and BERT-based pre-trained language models such as AfriBERTa and AfroXLMR. We also report results from additional experiments not in the system. Our system achieves a mixed bag of results, achieving a best rank of 7th in three of the languages - Igbo, Twi, and Yoruba.
GSAC: A Gujarati Sentiment Analysis Corpus from Twitter
Monil Gokani
|
Radhika Mamidi
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
Sentiment Analysis is an important task for analysing online content across languages for tasks such as content moderation and opinion mining. Though a significant amount of resources are available for Sentiment Analysis in several Indian languages, there do not exist any large-scale, open-access corpora for Gujarati. Our paper presents and describes the Gujarati Sentiment Analysis Corpus (GSAC), which has been sourced from Twitter and manually annotated by native speakers of the language. We describe in detail our collection and annotation processes and conduct extensive experiments on our corpus to provide reliable baselines for future work using our dataset.
2021
SimpleNER Sentence Simplification System for GEM 2021
K V Aditya Srivatsa
|
Monil Gokani
|
Manish Shrivastava
Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)
This paper describes SimpleNER, a model developed for the sentence simplification task at GEM-2021. Our system is a monolingual Seq2Seq Transformer architecture that uses control tokens pre-pended to the data, allowing the model to shape the generated simplifications according to user desired attributes. Additionally, we show that NER-tagging the training data before use helps stabilize the effect of the control tokens and significantly improves the overall performance of the system. We also employ pretrained embeddings to reduce data sparsity and allow the model to produce more generalizable outputs.