A German Twitter Snapshot

Tatjana Scheffler


Abstract
We present a new corpus of German tweets. Due to the relatively small number of German messages on Twitter, it is possible to collect a virtually complete snapshot of German twitter messages over a period of time. In this paper, we present our collection method which produced a 24 million tweet corpus, representing a large majority of all German tweets sent in April, 2013. Further, we analyze this representative data set and characterize the German twitterverse. While German Twitter data is similar to other Twitter data in terms of its temporal distribution, German Twitter users are much more reluctant to share geolocation information with their tweets. Finally, the corpus collection method allows for a study of discourse phenomena in the Twitter data, structured into discussion threads.
Anthology ID:
L14-1101
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2284–2289
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1146_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Tatjana Scheffler. 2014. A German Twitter Snapshot. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2284–2289, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
A German Twitter Snapshot (Scheffler, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1146_Paper.pdf