Classification of Censored Tweets in Chinese Language using XLNet

Shaikh Sahil Ahmed, Anand Kumar M.


Abstract
In the growth of today’s world and advanced technology, social media networks play a significant role in impacting human lives. Censorship is the overthrowing of speech, public transmission, or other details that play a vast role in social media. The content may be considered harmful, sensitive, or inconvenient. Authorities like institutes, governments, and other organizations conduct Censorship. This paper has implemented a model that helps classify censored and uncensored tweets as a binary classification. The paper describes submission to the Censorship shared task of the NLP4IF 2021 workshop. We used various transformer-based pre-trained models, and XLNet outputs a better accuracy among all. We fine-tuned the model for better performance and achieved a reasonable accuracy, and calculated other performance metrics.
Anthology ID:
2021.nlp4if-1.21
Volume:
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
Month:
June
Year:
2021
Address:
Online
Venue:
NLP4IF
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
136–139
Language:
URL:
https://aclanthology.org/2021.nlp4if-1.21
DOI:
10.18653/v1/2021.nlp4if-1.21
Bibkey:
Cite (ACL):
Shaikh Sahil Ahmed and Anand Kumar M.. 2021. Classification of Censored Tweets in Chinese Language using XLNet. In Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda, pages 136–139, Online. Association for Computational Linguistics.
Cite (Informal):
Classification of Censored Tweets in Chinese Language using XLNet (Ahmed & Kumar M., NLP4IF 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nlp4if-1.21.pdf