A French Corpus for Event Detection on Twitter

Béatrice Mazoyer, Julia Cagé, Nicolas Hervé, Céline Hudelot


Abstract
We present Event2018, a corpus annotated for event detection tasks, consisting of 38 million tweets in French (retweets excluded) including more than 130,000 tweets manually annotated by three annotators as related or unrelated to a given event. The 243 events were selected both from press articles and from subjects trending on Twitter during the annotation period (July to August 2018). In total, more than 95,000 tweets were annotated as related to one of the selected events. We also provide the titles and URLs of 15,500 news articles automatically detected as related to these events. In addition to this corpus, we detail the results of our event detection experiments on both this dataset and another publicly available dataset of tweets in English. We ran extensive tests with different types of text embeddings and a standard Topic Detection and Tracking algorithm, and detail our evaluation method. We show that tf-idf vectors allow the best performance for this task on both corpora. These results are intended to serve as a baseline for researchers wishing to test their own event detection systems on our corpus.
Anthology ID:
2020.lrec-1.763
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6220–6227
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.763
DOI:
Bibkey:
Cite (ACL):
Béatrice Mazoyer, Julia Cagé, Nicolas Hervé, and Céline Hudelot. 2020. A French Corpus for Event Detection on Twitter. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6220–6227, Marseille, France. European Language Resources Association.
Cite (Informal):
A French Corpus for Event Detection on Twitter (Mazoyer et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.763.pdf