Alda Mari

2025

United We Fine-Tune: Structurally Complementary Datasets for Hope Speech Detection
Priya Dharshini Krishnaraj | Tulio Ferreira Leite da Silva | Gonzalo Freijedo Aduna | Samuel Chen | Farah Benamara | Alda Mari
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

We propose a fine-tuning strategy for English Multi-class Hope Speech Detection using Mistral, leveraging two complementary datasets: PolyHope and CDB, a new unified framework for hope speech detection. While the former provides nuanced hope-related categories such as GENERALIZED, REALISTIC, and UNREALISTIC HOPE, the later introduces linguistically grounded dimensions including COUNTERFACTUAL, DESIRE, and BELIEF. By fine-tuning Mistral on both datasets, we enable the model to capture deeper semantic representations of hope. In addition to fine-tuning, we developed advanced prompting strategies which provide interpretable, zero-shot alternatives and further inform annotation and classification designs. Our approach achieved third place in the multi-class (Macro F1=71.77) and sixth in the binary (Macro F1=85.35) settings.

pdf bib abs

Computational modeling of user-generated desires on social media can significantly aid decision-makers across various fields. Initially explored through wish speech,this task has evolved into a nuanced examination of hope speech. To enhance understanding and detection, we propose a novel scheme rooted in formal semantics approaches to modality, capturing both future-oriented hopes through desires and beliefs and the counterfactuality of past unfulfilled wishes and regrets. We manually re-annotated existing hope speech datasets and built a new one which constitutes a new benchmark in the field. We also explore the capabilities of LLMs in automatically detecting hope speech, relying on several prompting strategies. To the best of our knowledge, this is the first attempt towards a language-driven decomposition of the notional category hope and its automatic detection in a unified setting.

2024

pdf bib abs

In emergency situations users of social networks convey all sorts of what have been called communicative intentions, well-known since the work of Austin (1962) and Searle (1969) as speech acts (SA). While speech acts have been the focus of close scrutiny in the philosophical and linguistic literature (see (Portner, 2018) for extended discussion), their role has been only rarely understood and exploited in processing social media content about crisis events, our focus here. Current work on communicative intentions in social media are topic-oriented, focusing on the correlation between SA and specific topics such as crisis (e.g., earthquakes) but also politics, celebrities, cooking, travel, etc. It has been observed that people globally tend to react to natural disasters with SA distinct from those used in other contexts (e.g., celebrities, which are essentially made up of comments). Here, we explore the further hypothesis of a correlation between different SA types and urgency and propose an in depth linguistic and computational analysis of communicative intentions in tweets from an urgency-oriented perspective. Indeed, SA are mostly relevant to identify intentions, desires, plans and preferences towards action and to ultimately produce a system intended to help rescue teams. Our contribution is four-fold and consists of: (1) A two-layer annotation scheme of speech acts both at the tweet and sub-tweet levels, (2) A new French dataset of about 13K tweets annotated for both urgency and SA, targeting both expected (e.g., storms) and unexpected or sudden (e.g., building collapse, explosion) events, (3) A thorough analysis of the annotations studying in particular the correlation between SA and the urgency of the message, SA and intentions to act categories (e.g., human damages), and SA and crisis types, finally, (4) A set of deep learning experiments to detect SA in crises related corpora. Our results show a strong correlation between SA and urgency annotations at both the tweet and sub-tweet levels with a particular salient correlation in the latter case, which constitutes a first important step towards SA-aware NLP-based crisis management on social media.

2023

pdf bib abs

Pragmatic Annotation of Articles Related to Police Brutality
Tess Feyen | Alda Mari | Paul Portner
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

The annotation task we elaborated aims at describing the contextual factors that influence the appearance and interpretation of moral predicates, in newspaper articles on police brutality, in French and in English. The paper provides a brief review of the literature on moral predicates and their relation with context. The paper also describes the elaboration of the corpus and the ontology. Our hypothesis is that the use of moral adjectives and their appearance in context could change depending on the political orientation of the journal. We elaborated an annotation task to investigate the precise contexts discussed in articles on police brutality. The paper concludes by describing the study and the annotation task in details.

pdf bib abs

Classification de tweets en situation d’urgence pour la gestion de crises
Romain Meunier | Leila Moudjari | Farah Benamara | Véronique Moriceau | Alda Mari | Patricia Stolf
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux -- articles longs

Le traitement de données provenant de réseaux sociaux en temps réel est devenu une outil attractifdans les situations d’urgence, mais la surcharge d’informations reste un défi à relever. Dans cet article,nous présentons un nouveau jeu de données en français annoté manuellement pour la gestion de crise.Nous testons également plusieurs modèles d’apprentissage automatique pour classer des tweets enfonction de leur pertinence, de l’urgence et de l’intention qu’ils véhiculent afin d’aider au mieux lesservices de secours durant les crises selon des méthodes d’évaluation spécifique à la gestion de crise.Nous évaluons également nos modèles lorsqu’ils sont confrontés à de nouvelles crises ou même denouveaux types de crises, avec des résultats encourageants

2022

pdf bib abs

Recognizing speech acts (SA) is crucial for capturing meaning beyond what is said, making communicative intentions particularly relevant to identify urgent messages. This paper attempts to measure for the first time the impact of SA on urgency detection during crises,006in tweets. We propose a new dataset annotated for both urgency and SA, and develop several deep learning architectures to inject SA into urgency detection while ensuring models generalisability. Our results show that taking speech acts into account in tweet analysis improves information type detection in an out-of-type configuration where models are evaluated in unseen event types during training. These results are encouraging and constitute a first step towards SA-aware disaster management in social media.

pdf bib abs

Give me your Intentions, I’ll Predict our Actions: A Two-level Classification of Speech Acts for Crisis Management in Social Media
Enzo Laurenti | Nils Bourgon | Farah Benamara | Alda Mari | Véronique Moriceau | Camille Courgeon
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Discovered by (Austin,1962) and extensively promoted by (Searle, 1975), speech acts (SA) have been the object of extensive discussion in the philosophical and the linguistic literature, as well as in computational linguistics where the detection of SA have shown to be an important step in many down stream NLP applications. In this paper, we attempt to measure for the first time the role of SA on urgency detection in tweets, focusing on natural disasters. Indeed, SA are particularly relevant to identify intentions, desires, plans and preferences towards action, providing therefore actionable information that will help to set priorities for the human teams and decide appropriate rescue actions. To this end, we come up here with four main contributions: (1) A two-layer annotation scheme of SA both at the tweet and subtweet levels, (2) A new French dataset of 6,669 tweets annotated for both urgency and SA, (3) An in-depth analysis of the annotation campaign, highlighting the correlation between SA and urgency categories, and (4) A set of deep learning experiments to detect SA in a crisis corpus. Our results show that SA are correlated with urgency which is a first important step towards SA-aware NLP-based crisis management on social media.

2020

pdf bib abs

Social media networks have become a space where users are free to relate their opinions and sentiments which may lead to a large spreading of hatred or abusive messages which have to be moderated. This paper presents the first French corpus annotated for sexism detection composed of about 12,000 tweets. In a context of offensive content mediation on social media now regulated by European laws, we think that it is important to be able to detect automatically not only sexist content but also to identify if a message with a sexist content is really sexist (i.e. addressed to a woman or describing a woman or women in general) or is a story of sexism experienced by a woman. This point is the novelty of our annotation scheme. We also propose some preliminary results for sexism detection obtained with a deep learning approach. Our experiments show encouraging results.

pdf bib abs

He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist
Patricia Chiril | Véronique Moriceau | Farah Benamara | Alda Mari | Gloria Origgi | Marlène Coulomb-Gully
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In a context of offensive content mediation on social media now regulated by European laws, it is important not only to be able to automatically detect sexist content but also to identify if a message with a sexist content is really sexist or is a story of sexism experienced by a woman. We propose: (1) a new characterization of sexist content inspired by speech acts theory and discourse analysis studies, (2) the first French dataset annotated for sexism detection, and (3) a set of deep learning experiments trained on top of a combination of several tweet’s vectorial representations (word embeddings, linguistic features, and various generalization strategies). Our results are encouraging and constitute a first step towards offensive content moderation.

2006

pdf bib

2002

pdf bib

Under-specification and contextual variability of abstract
Alda Mari
Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions

Alda Mari

2025

2024

2023

2022

2020

2006

2002

Co-authors

Venues