Anaïs Ollagnier

Also published as: Anais Ollagnier


2026

Antisocial behavior (ASB) on social media encompasses online behaviors that harm individuals, groups, or platform ecosystems, including hate speech, harassment, cyberbullying, trolling, and coordinated abuse. While most prior work has focused on detecting harm after it occurs, a growing body of research on ASB prediction seeks to forecast future harmful outcomes before they materialize, including—but not limited to—hate-speech diffusion, conversational derailment, and user recidivism. However, this emerging field remains fragmented, with limited conceptual grounding and few integrative frameworks. This paper establishes a foundation for ASB prediction by introducing a structured taxonomy spanning temporal, structural, and behavioral dimensions. Drawing on 49 machine learning studies identified through a literature review, we map predictive goals to datasets, modeling choices, and evaluation practices, and identify key challenges, including the lack of standardized benchmarks, the dominance of text-centric representations, and trade-offs between accuracy and interpretability. We conclude by outlining actionable directions toward more robust, generalizable, and responsible ASB prediction systems.

2025

Question Under Discussion (QUD) is a discourse framework that has attracted growing interest in NLP in recent years. Among existing QUD models, the QUD tree approach (Riester, 2019) focuses on reconstructing QUDs and their hierarchical relationships, using a single tree to represent discourse structure. Prior implementation shows moderate inter-annotator agreement, highlighting the challenging nature of this task. In this paper, we propose a new QUD model for annotating hierarchical discourse structure. Our annotation achieves high inter-annotator agreement: 81.45% for short files and 79.53% for long files of Wall Street Journal articles. We show preliminary results on using GPT-4 for automatic annotation, which suggests that one of the best-performing LLMs still struggles with capturing hierarchical discourse structure. Moreover, we compare the annotations with RST annotations. Lastly, we present an approach for integrating hierarchical and local discourse relation annotations with the proposed model.

2024

As a part of the release of the CyberAgressionAdo-V2 dataset, this paper introduces a new tagset that includes tags marking pragmatic-level information occurring in cyberbullying situations. The previous version of this dataset, CyberAgressionAdo-V1, consists of aggressive multiparty chats in French annotated using a hierarchical tagset developed to describe bullying narrative events including the participant roles, the presence of hate speech, the type of verbal abuse, among others. In contrast, CyberAgressionAdo-V2 uses a multi-label, fine-grained tagset marking the discursive role of exchanged messages as well as the context in which they occur — for instance, attack (ATK), defend (DFN), counterspeech (CNS), abet/instigate (AIN), gaslight (GSL), etc. This paper provides a comprehensive overview of the annotation tagset and presents statistical insights derived from its application. Additionally, we address the challenges encountered when annotating pragmatic-level information in this context, conducting a thorough analysis of annotator disagreements. The resulting dataset comprises 19 conversations that have been manually annotated and is now available to facilitate further research in the field.

2022

Over the past decades, the number of episodes of cyber aggression occurring online has grown substantially, especially among teens. Most solutions investigated by the NLP community to curb such online abusive behaviors consist of supervised approaches relying on annotated data extracted from social media. However, recent studies have highlighted that private instant messaging platforms are major mediums of cyber aggression among teens. As such interactions remain invisible due to the app privacy policies, very few datasets collecting aggressive conversations are available for the computational analysis of language. In order to overcome this limitation, in this paper we present the CyberAgressionAdo-V1 dataset, containing aggressive multiparty chats in French collected through a role-playing game in high-schools, and annotated at different layers. We describe the data collection and annotation phases, carried out in the context of a EU and a national research projects, and provide insightful analysis on the different types of aggression and verbal abuse depending on the targeted victims (individuals or communities) emerging from the collected data.

2015

2014