An Empirical Study on Pre-trained Embeddings and Language Models for Bot Detection

Andres Garcia-Silva; Cristian Berrio; José Manuel Gómez-Pérez

doi:10.18653/v1/W19-4317

An Empirical Study on Pre-trained Embeddings and Language Models for Bot Detection

Andres Garcia-Silva, Cristian Berrio, José Manuel Gómez-Pérez

Abstract

Fine-tuning pre-trained language models has significantly advanced the state of art in a wide range of NLP downstream tasks. Usually, such language models are learned from large and well-formed text corpora from e.g. encyclopedic resources, books or news. However, a significant amount of the text to be analyzed nowadays is Web data, often from social media. In this paper we consider the research question: How do standard pre-trained language models generalize and capture the peculiarities of rather short, informal and frequently automatically generated text found in social media? To answer this question, we focus on bot detection in Twitter as our evaluation task and test the performance of fine-tuning approaches based on language models against popular neural architectures such as LSTM and CNN combined with pre-trained and contextualized embeddings. Our results also show strong performance variations among the different language model approaches, which suggest further research.

Anthology ID:: W19-4317
Volume:: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Month:: August
Year:: 2019
Address:: Florence, Italy
Editors:: Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Johannes Welbl, Alexis Conneau, Xiang Ren, Marek Rei
Venue:: RepL4NLP
SIG:: SIGREP
Publisher:: Association for Computational Linguistics
Note:
Pages:: 148–155
Language:
URL:: https://aclanthology.org/W19-4317/
DOI:: 10.18653/v1/W19-4317
Bibkey:
Cite (ACL):: Andres Garcia-Silva, Cristian Berrio, and José Manuel Gómez-Pérez. 2019. An Empirical Study on Pre-trained Embeddings and Language Models for Bot Detection. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 148–155, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: An Empirical Study on Pre-trained Embeddings and Language Models for Bot Detection (Garcia-Silva et al., RepL4NLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-4317.pdf

PDF Cite Search Fix data