Language-Agnostic Twitter-Bot Detection

Jürgen Knauth


Abstract
In this paper we address the problem of detecting Twitter bots. We analyze a dataset of 8385 Twitter accounts and their tweets consisting of both humans and different kinds of bots. We use this data to train machine learning classifiers that distinguish between real and bot accounts. We identify features that are easy to extract while still providing good results. We analyze different feature groups based on account specific, tweet specific and behavioral specific features and measure their performance compared to other state of the art bot detection methods. For easy future portability of our work we focus on language-agnostic features. With AdaBoost, the best performing classifier, we achieve an accuracy of 0.988 and an AUC of 0.995. As the creation of good training data in machine learning is often difficult - especially in the domain of Twitter bot detection - we additionally analyze to what extent smaller amounts of training data lead to useful results by reviewing cross-validated learning curves. Our results indicate that using few but expressive features already has a good practical benefit for bot detection, especially if only a small amount of training data is available.
Anthology ID:
R19-1065
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
550–558
Language:
URL:
https://aclanthology.org/R19-1065
DOI:
10.26615/978-954-452-056-4_065
Bibkey:
Cite (ACL):
Jürgen Knauth. 2019. Language-Agnostic Twitter-Bot Detection. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 550–558, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Language-Agnostic Twitter-Bot Detection (Knauth, RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1065.pdf