An Annotated Corpus for Sexism Detection in French Tweets

Patricia Chiril; Véronique Moriceau; Farah Benamara; Alda Mari; Gloria Origgi; Marlène Coulomb-Gully

An Annotated Corpus for Sexism Detection in French Tweets

Patricia Chiril, Véronique Moriceau, Farah Benamara, Alda Mari, Gloria Origgi, Marlène Coulomb-Gully

Abstract

Social media networks have become a space where users are free to relate their opinions and sentiments which may lead to a large spreading of hatred or abusive messages which have to be moderated. This paper presents the first French corpus annotated for sexism detection composed of about 12,000 tweets. In a context of offensive content mediation on social media now regulated by European laws, we think that it is important to be able to detect automatically not only sexist content but also to identify if a message with a sexist content is really sexist (i.e. addressed to a woman or describing a woman or women in general) or is a story of sexism experienced by a woman. This point is the novelty of our annotation scheme. We also propose some preliminary results for sexism detection obtained with a deep learning approach. Our experiments show encouraging results.

Anthology ID:: 2020.lrec-1.175
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 1397–1403
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.175/
DOI:
Bibkey:
Cite (ACL):: Patricia Chiril, Véronique Moriceau, Farah Benamara, Alda Mari, Gloria Origgi, and Marlène Coulomb-Gully. 2020. An Annotated Corpus for Sexism Detection in French Tweets. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1397–1403, Marseille, France. European Language Resources Association.
Cite (Informal):: An Annotated Corpus for Sexism Detection in French Tweets (Chiril et al., LREC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.lrec-1.175.pdf

PDF Cite Search Fix data