Romain Meunier
2025
CrisisTS: Coupling Social Media Textual Data and Meteorological Time Series for Urgency Classification
Romain Meunier | Farah Benamara | Véronique Moriceau | Zhongzheng Qiao | Savitha Ramasamy
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Romain Meunier | Farah Benamara | Véronique Moriceau | Zhongzheng Qiao | Savitha Ramasamy
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper proposes CrisisTS, the first multimodal and multilingual dataset for urgency classification composed of benchmark crisis datasets from French and English social media about various expected (e.g., flood, storm) and sudden (e.g., earthquakes, explosions) crises that have been mapped with open source geocoded meteorological time series data. This mapping is based on a simple and effective strategy that allows for temporal and location alignment even in the absence of location mention in the text. A set of multimodal experiments have been conducted relying on transformers and LLMs to improve overall performances while ensuring model generalizability. Our results show that modality fusion outperforms text-only models.
2024
Digging Communicative Intentions: The Case of Crises Events
Farah Benamara | Alda Mari | Romain Meunier | Véronique Moriceau | Leila Moudjari | Valentin Tinarrage
Dialogue Discourse Volume 15
Farah Benamara | Alda Mari | Romain Meunier | Véronique Moriceau | Leila Moudjari | Valentin Tinarrage
Dialogue Discourse Volume 15
In emergency situations users of social networks convey all sorts of what have been called communicative intentions, well-known since the work of Austin (1962) and Searle (1969) as speech acts (SA). While speech acts have been the focus of close scrutiny in the philosophical and linguistic literature (see (Portner, 2018) for extended discussion), their role has been only rarely understood and exploited in processing social media content about crisis events, our focus here. Current work on communicative intentions in social media are topic-oriented, focusing on the correlation between SA and specific topics such as crisis (e.g., earthquakes) but also politics, celebrities, cooking, travel, etc. It has been observed that people globally tend to react to natural disasters with SA distinct from those used in other contexts (e.g., celebrities, which are essentially made up of comments). Here, we explore the further hypothesis of a correlation between different SA types and urgency and propose an in depth linguistic and computational analysis of communicative intentions in tweets from an urgency-oriented perspective. Indeed, SA are mostly relevant to identify intentions, desires, plans and preferences towards action and to ultimately produce a system intended to help rescue teams. Our contribution is four-fold and consists of: (1) A two-layer annotation scheme of speech acts both at the tweet and sub-tweet levels, (2) A new French dataset of about 13K tweets annotated for both urgency and SA, targeting both expected (e.g., storms) and unexpected or sudden (e.g., building collapse, explosion) events, (3) A thorough analysis of the annotations studying in particular the correlation between SA and the urgency of the message, SA and intentions to act categories (e.g., human damages), and SA and crisis types, finally, (4) A set of deep learning experiments to detect SA in crises related corpora. Our results show a strong correlation between SA and urgency annotations at both the tweet and sub-tweet levels with a particular salient correlation in the latter case, which constitutes a first important step towards SA-aware NLP-based crisis management on social media.
2023
Classification de tweets en situation d’urgence pour la gestion de crises
Romain Meunier | Leila Moudjari | Farah Benamara | Véronique Moriceau | Alda Mari | Patricia Stolf
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux -- articles longs
Romain Meunier | Leila Moudjari | Farah Benamara | Véronique Moriceau | Alda Mari | Patricia Stolf
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux -- articles longs
Le traitement de données provenant de réseaux sociaux en temps réel est devenu une outil attractifdans les situations d’urgence, mais la surcharge d’informations reste un défi à relever. Dans cet article,nous présentons un nouveau jeu de données en français annoté manuellement pour la gestion de crise.Nous testons également plusieurs modèles d’apprentissage automatique pour classer des tweets enfonction de leur pertinence, de l’urgence et de l’intention qu’ils véhiculent afin d’aider au mieux lesservices de secours durant les crises selon des méthodes d’évaluation spécifique à la gestion de crise.Nous évaluons également nos modèles lorsqu’ils sont confrontés à de nouvelles crises ou même denouveaux types de crises, avec des résultats encourageants
Image and Text: Fighting the same Battle? Super Resolution Learning for Imbalanced Text Classification
Romain Meunier | Benamara Farah | Véronique Moriceau | Patricia Stolf
Findings of the Association for Computational Linguistics: EMNLP 2023
Romain Meunier | Benamara Farah | Véronique Moriceau | Patricia Stolf
Findings of the Association for Computational Linguistics: EMNLP 2023
In this paper, we propose SRL4NLP, a new approach for data augmentation by drawing an analogy between image and text processing: Super-resolution learning. This method is based on using high-resolution images to overcome the problem of low resolution images. While this technique is a common usage in image processing when images have a low resolution or are too noisy, it has never been used in NLP. We therefore propose the first adaptation of this method for text classification and evaluate its effectiveness on urgency detection from tweets posted in crisis situations, a very challenging task where messages are scarce and highly imbalanced. We show that this strategy is efficient when compared to competitive state-of-the-art data augmentation techniques on several benchmarks datasets in two languages.