Anaïs Ollagnier
Also published as: Anais Ollagnier
2026
Antisocial Behavior Prediction: A Survey and Practical Guide
Anaïs Ollagnier
The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis (WASSA 2026)
Anaïs Ollagnier
The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis (WASSA 2026)
Antisocial behavior (ASB) on social media encompasses online behaviors that harm individuals, groups, or platform ecosystems, including hate speech, harassment, cyberbullying, trolling, and coordinated abuse. While most prior work has focused on detecting harm after it occurs, a growing body of research on ASB prediction seeks to forecast future harmful outcomes before they materialize, including—but not limited to—hate-speech diffusion, conversational derailment, and user recidivism. However, this emerging field remains fragmented, with limited conceptual grounding and few integrative frameworks. This paper establishes a foundation for ASB prediction by introducing a structured taxonomy spanning temporal, structural, and behavioral dimensions. Drawing on 49 machine learning studies identified through a literature review, we map predictive goals to datasets, modeling choices, and evaluation practices, and identify key challenges, including the lack of standardized benchmarks, the dominance of text-centric representations, and trade-offs between accuracy and interpretability. We conclude by outlining actionable directions toward more robust, generalizable, and responsible ASB prediction systems.
2025
A Topicality-Driven QUD Model for Discourse Processing
Yingxue Fu | Mark-Jan Nederhof | Anais Ollagnier
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Yingxue Fu | Mark-Jan Nederhof | Anais Ollagnier
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Question Under Discussion (QUD) is a discourse framework that has attracted growing interest in NLP in recent years. Among existing QUD models, the QUD tree approach (Riester, 2019) focuses on reconstructing QUDs and their hierarchical relationships, using a single tree to represent discourse structure. Prior implementation shows moderate inter-annotator agreement, highlighting the challenging nature of this task. In this paper, we propose a new QUD model for annotating hierarchical discourse structure. Our annotation achieves high inter-annotator agreement: 81.45% for short files and 79.53% for long files of Wall Street Journal articles. We show preliminary results on using GPT-4 for automatic annotation, which suggests that one of the best-performing LLMs still struggles with capturing hierarchical discourse structure. Moreover, we compare the annotations with RST annotations. Lastly, we present an approach for integrating hierarchical and local discourse relation annotations with the proposed model.
CyberAgressionAdo-Large: French Multiparty Chat Dataset to Address Online Hate
Anaïs Ollagnier | Elena Cabrio | Serena Villata | Valerio Basile
Traitement Automatique des Langues, Volume 65, Numéro 3 : Discours de haine : ressources linguistiques, méthodes et applications [Abusive Language: Linguistic Resources, Methods and Applications]
Anaïs Ollagnier | Elena Cabrio | Serena Villata | Valerio Basile
Traitement Automatique des Langues, Volume 65, Numéro 3 : Discours de haine : ressources linguistiques, méthodes et applications [Abusive Language: Linguistic Resources, Methods and Applications]
2024
CyberAgressionAdo-v2: Leveraging Pragmatic-Level Information to Decipher Online Hate in French Multiparty Chats
Anais Ollagnier
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Anais Ollagnier
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
As a part of the release of the CyberAgressionAdo-V2 dataset, this paper introduces a new tagset that includes tags marking pragmatic-level information occurring in cyberbullying situations. The previous version of this dataset, CyberAgressionAdo-V1, consists of aggressive multiparty chats in French annotated using a hierarchical tagset developed to describe bullying narrative events including the participant roles, the presence of hate speech, the type of verbal abuse, among others. In contrast, CyberAgressionAdo-V2 uses a multi-label, fine-grained tagset marking the discursive role of exchanged messages as well as the context in which they occur — for instance, attack (ATK), defend (DFN), counterspeech (CNS), abet/instigate (AIN), gaslight (GSL), etc. This paper provides a comprehensive overview of the annotation tagset and presents statistical insights derived from its application. Additionally, we address the challenges encountered when annotating pragmatic-level information in this context, conducting a thorough analysis of annotator disagreements. The resulting dataset comprises 19 conversations that have been manually annotated and is now available to facilitate further research in the field.
2022
CyberAgressionAdo-v1: a Dataset of Annotated Online Aggressions in French Collected through a Role-playing Game
Anaïs Ollagnier | Elena Cabrio | Serena Villata | Catherine Blaya
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Anaïs Ollagnier | Elena Cabrio | Serena Villata | Catherine Blaya
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Over the past decades, the number of episodes of cyber aggression occurring online has grown substantially, especially among teens. Most solutions investigated by the NLP community to curb such online abusive behaviors consist of supervised approaches relying on annotated data extracted from social media. However, recent studies have highlighted that private instant messaging platforms are major mediums of cyber aggression among teens. As such interactions remain invisible due to the app privacy policies, very few datasets collecting aggressive conversations are available for the computational analysis of language. In order to overcome this limitation, in this paper we present the CyberAgressionAdo-V1 dataset, containing aggressive multiparty chats in French collected through a role-playing game in high-schools, and annotated at different layers. We describe the data collection and annotation phases, carried out in the context of a EU and a national research projects, and provide insightful analysis on the different types of aggression and verbal abuse depending on the targeted victims (individuals or communities) emerging from the collected data.
2015
Analyse en dépendance et classification de requêtes en langue naturelle, application à la recommandation de livres [Dependency parsing and classification of natural language queries: application to book recommendation]
Anaïs Ollagnier | Sébastien Fournier | Patrice Bellot
Traitement Automatique des Langues, Volume 56, Numéro 3 : Recherche d'Information [Information Retrieval]
Anaïs Ollagnier | Sébastien Fournier | Patrice Bellot
Traitement Automatique des Langues, Volume 56, Numéro 3 : Recherche d'Information [Information Retrieval]
2014
Impact of the nature and size of the training set on performance in the automatic detection of named entities (Impact de la nature et de la taille des corpus d’apprentissage sur les performances dans la détection automatique des entités nommées) [in French]
Anaïs Ollagnier | Sébastien Fournier | Patrice Bellot | Frédéric Béchet
Proceedings of TALN 2014 (Volume 2: Short Papers)
Anaïs Ollagnier | Sébastien Fournier | Patrice Bellot | Frédéric Béchet
Proceedings of TALN 2014 (Volume 2: Short Papers)