Patricia Chiril


2025

pdf bib
Seeds of Discourse: A Multilingual Corpus of Direct Quotations from African Media on Agricultural Biotechnologies
Patricia Chiril | Trevor Spreadbury | Joeva Sean Rock | Brian Dowd-Uribe | David Uminsky
Findings of the Association for Computational Linguistics: NAACL 2025

Direct quotations play a crucial role in journalism by substantiating claims and enhancing persuasive communication. This makes news articles a rich resource for opinion mining, providing valuable insights into the topics they cover. This paper presents the first multilingual corpora (English and French) featuring both manually annotated (1,657) and automatically extracted (102,483) direct quotations related to agricultural biotechnologies from a curated list of Africa-based news sources. In addition, we provide 665 instances annotated for Aspect-Based Sentiment Analysis, enabling a fine-grained examination of sentiment toward key aspects of agricultural biotechnologies. These corpora are freely available to the research community for future work on media discourse surrounding agricultural biotechnologies.

pdf bib
Entailment Progressions: A Robust Approach to Evaluating Reasoning Within Larger Discourse
Rishabh Shastry | Patricia Chiril | Joshua Charney | David Uminsky
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Textual entailment, or the ability to deduce whether a proposed hypothesis is logically supported by a given premise, has historically been applied to the evaluation of language modelling efficiency in tasks like question answering and text summarization. However, we hypothesize that these zero-shot entailment evaluations can be extended to the task of evaluating discourse within larger textual narratives. In this paper, we propose a simple but effective method that sequentially evaluates changes in textual entailment between sentences within a larger text, in an approach we denote as “Entailment Progressions”. These entailment progressions aim to capture the inference relations between sentences as an underlying component capable of distinguishing texts generated from various models and procedures. Our results suggest that entailment progressions can be used to effectively distinguish between machine-generated and human-authored texts across multiple established benchmark corpora and our own EP4MGT dataset. Additionally, our method displays robustness in performance when evaluated on paraphrased texts a technique that has historically affected the performance of well-established metrics when distinguishing between machine generated and human authored texts.

2023

pdf bib
What Did You Learn To Hate? A Topic-Oriented Analysis of Generalization in Hate Speech Detection
Tom Bourgeade | Patricia Chiril | Farah Benamara | Véronique Moriceau
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Hate speech has unfortunately become a significant phenomenon on social media platforms, and it can cover various topics (misogyny, sexism, racism, xenophobia, etc.) and targets (e.g., black people, women). Various hate speech detection datasets have been proposed, some annotated for specific topics, and others for hateful speech in general. In either case, they often employ different annotation guidelines, which can lead to inconsistencies, even in datasets focusing on the same topics. This can cause issues in models trying to generalize across more data and more topics in order to improve detection accuracy. In this paper, we propose, for the first time, a topic-oriented approach to study generalization across popular hate speech datasets. We first perform a comparative analysis of the performances of Transformer-based models in capturing topic-generic and topic-specific knowledge when trained on different datasets. We then propose a novel, simple yet effective approach to study more precisely which topics are best captured in implicit manifestations of hate, showing that selecting combinations of datasets with better out-of-domain topical coverage improves the reliability of automatic hate speech detection.

2022

pdf bib
Tales and Tropes: Gender Roles from Word Embeddings in a Century of Children’s Books
Anjali Adukia | Patricia Chiril | Callista Christ | Anjali Das | Alex Eble | Emileigh Harrison | Hakizumwami Birali Runesha
Proceedings of the 29th International Conference on Computational Linguistics

The manner in which gender is portrayed in materials used to teach children conveys messages about people’s roles in society. In this paper, we measure the gendered depiction of central domains of social life in 100 years of highly influential children’s books. We make two main contributions: (1) we find that the portrayal of gender in these books reproduces traditional gender norms in society, and (2) we publish StoryWords 1.0, the first word embeddings trained on such a large body of children’s literature. We find that, relative to males, females are more likely to be represented in relation to their appearance than in relation to their competence; second, they are more likely to be represented in relation to their role in the family than their role in business. Finally, we find that non-binary or gender-fluid individuals are rarely mentioned. Our analysis advances understanding of the different messages contained in content commonly used to teach children, with immediate applications for practice, policy, and research.

2021

pdf bib
“Be nice to your wife! The restaurants are closed”: Can Gender Stereotype Detection Improve Sexism Classification?
Patricia Chiril | Farah Benamara | Véronique Moriceau
Findings of the Association for Computational Linguistics: EMNLP 2021

In this paper, we focus on the detection of sexist hate speech against women in tweets studying for the first time the impact of gender stereotype detection on sexism classification. We propose: (1) the first dataset annotated for gender stereotype detection, (2) a new method for data augmentation based on sentence similarity with multilingual external datasets, and (3) a set of deep learning experiments first to detect gender stereotypes and then, to use this auxiliary task for sexism detection. Although the presence of stereotypes does not necessarily entail hateful content, our results show that sexism classification can definitively benefit from gender stereotype detection.

2020

pdf bib
He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist
Patricia Chiril | Véronique Moriceau | Farah Benamara | Alda Mari | Gloria Origgi | Marlène Coulomb-Gully
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In a context of offensive content mediation on social media now regulated by European laws, it is important not only to be able to automatically detect sexist content but also to identify if a message with a sexist content is really sexist or is a story of sexism experienced by a woman. We propose: (1) a new characterization of sexist content inspired by speech acts theory and discourse analysis studies, (2) the first French dataset annotated for sexism detection, and (3) a set of deep learning experiments trained on top of a combination of several tweet’s vectorial representations (word embeddings, linguistic features, and various generalization strategies). Our results are encouraging and constitute a first step towards offensive content moderation.

pdf bib
An Annotated Corpus for Sexism Detection in French Tweets
Patricia Chiril | Véronique Moriceau | Farah Benamara | Alda Mari | Gloria Origgi | Marlène Coulomb-Gully
Proceedings of the Twelfth Language Resources and Evaluation Conference

Social media networks have become a space where users are free to relate their opinions and sentiments which may lead to a large spreading of hatred or abusive messages which have to be moderated. This paper presents the first French corpus annotated for sexism detection composed of about 12,000 tweets. In a context of offensive content mediation on social media now regulated by European laws, we think that it is important to be able to detect automatically not only sexist content but also to identify if a message with a sexist content is really sexist (i.e. addressed to a woman or describing a woman or women in general) or is a story of sexism experienced by a woman. This point is the novelty of our annotation scheme. We also propose some preliminary results for sexism detection obtained with a deep learning approach. Our experiments show encouraging results.

2019

pdf bib
Multilingual and Multitarget Hate Speech Detection in Tweets
Patricia Chiril | Farah Benamara Zitoune | Véronique Moriceau | Marlène Coulomb-Gully | Abhishek Kumar
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Social media networks have become a space where users are free to relate their opinions and sentiments which may lead to a large spreading of hatred or abusive messages which have to be moderated. This paper proposes a supervised approach to hate speech detection from a multilingual perspective. We focus in particular on hateful messages towards two different targets (immigrants and women) in English tweets, as well as sexist messages in both English and French. Several models have been developed ranging from feature-engineering approaches to neural ones. Our experiments show very encouraging results on both languages.

pdf bib
The binary trio at SemEval-2019 Task 5: Multitarget Hate Speech Detection in Tweets
Patricia Chiril | Farah Benamara Zitoune | Véronique Moriceau | Abhishek Kumar
Proceedings of the 13th International Workshop on Semantic Evaluation

The massive growth of user-generated web content through blogs, online forums and most notably, social media networks, led to a large spreading of hatred or abusive messages which have to be moderated. This paper proposes a supervised approach to hate speech detection towards immigrants and women in English tweets. Several models have been developed ranging from feature-engineering approaches to neural ones.