GSAC: A Gujarati Sentiment Analysis Corpus from Twitter

Monil Gokani, Radhika Mamidi


Abstract
Sentiment Analysis is an important task for analysing online content across languages for tasks such as content moderation and opinion mining. Though a significant amount of resources are available for Sentiment Analysis in several Indian languages, there do not exist any large-scale, open-access corpora for Gujarati. Our paper presents and describes the Gujarati Sentiment Analysis Corpus (GSAC), which has been sourced from Twitter and manually annotated by native speakers of the language. We describe in detail our collection and annotation processes and conduct extensive experiments on our corpus to provide reliable baselines for future work using our dataset.
Anthology ID:
2023.wassa-1.12
Volume:
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Jeremy Barnes, Orphée De Clercq, Roman Klinger
Venue:
WASSA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
129–137
Language:
URL:
https://aclanthology.org/2023.wassa-1.12
DOI:
10.18653/v1/2023.wassa-1.12
Bibkey:
Cite (ACL):
Monil Gokani and Radhika Mamidi. 2023. GSAC: A Gujarati Sentiment Analysis Corpus from Twitter. In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, pages 129–137, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
GSAC: A Gujarati Sentiment Analysis Corpus from Twitter (Gokani & Mamidi, WASSA 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wassa-1.12.pdf
Video:
 https://aclanthology.org/2023.wassa-1.12.mp4