The Constant in HATE: Toxicity in Reddit across Topics and Languages

Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek T.J.M. Vossen


Abstract
Toxic language remains an ongoing challenge on social media platforms, presenting significant issues for users and communities. This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations. We collect 1.5 million comment threads from 481 communities in six languages. By aligning languages with topics, we thoroughly analyze how toxicity spikes within different communities. Our analysis targets six languages spanning different communities and topics such as Culture, Politics, and News. We observe consistent patterns across languages where toxicity increases within the same topics while also identifying significant differences where specific language communities exhibit notable variations in relation to certain topics.
Anthology ID:
2024.trac-1.1
Volume:
Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Bharathi Raja Chakravarthi, Bornini Lahiri, Siddharth Singh, Shyam Ratan
Venues:
TRAC | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
1–11
Language:
URL:
https://aclanthology.org/2024.trac-1.1
DOI:
Bibkey:
Cite (ACL):
Wondimagegnhue Tsegaye Tufa, Ilia Markov, and Piek T.J.M. Vossen. 2024. The Constant in HATE: Toxicity in Reddit across Topics and Languages. In Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024, pages 1–11, Torino, Italia. ELRA and ICCL.
Cite (Informal):
The Constant in HATE: Toxicity in Reddit across Topics and Languages (Tufa et al., TRAC-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.trac-1.1.pdf