Towards Sentiment Analysis of Tobacco Products’ Usage in Social Media

Venkata Himakar Yanamandra, Kartikey Pant, Radhika Mamidi


Abstract
Contemporary tobacco-related studies are mostly concerned with a single social media platform while missing out on a broader audience. Moreover, they are heavily reliant on labeled datasets, which are expensive to make. In this work, we explore sentiment and product identification on tobacco-related text from two social media platforms. We release SentiSmoke-Twitter and SentiSmoke-Reddit datasets, along with a comprehensive annotation schema for identifying tobacco products’ sentiment. We then perform benchmarking text classification experiments using state-of-the-art models, including BERT, RoBERTa, and DistilBERT. Our experiments show F1 scores as high as 0.72 for sentiment identification in the Twitter dataset, 0.46 for sentiment identification, and 0.57 for product identification using semi-supervised learning for Reddit.
Anthology ID:
2021.ranlp-1.173
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1545–1552
Language:
URL:
https://aclanthology.org/2021.ranlp-main.173
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-main.173.pdf