Frédérique Segond

Also published as: Frederique Segond

2023

Comparative Analysis of Anomaly Detection Algorithms in Text Data
Yizhou Xu | Kata Gábor | Jérôme Milleret | Frédérique Segond
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Text anomaly detection (TAD) is a crucial task that aims to identify texts that deviate significantly from the norm within a corpus. Despite its importance in various domains, TAD remains relatively underexplored in natural language processing. This article presents a systematic evaluation of 22 TAD algorithms on 17 corpora using multiple text representations, including monolingual and multilingual SBERT. The performance of the algorithms is compared based on three criteria: degree of supervision, theoretical basis, and architecture used. The results demonstrate that semi-supervised methods utilizing weak labels outperform both unsupervised methods and semi-supervised methods using only negative samples for training. Additionally, we explore the application of TAD techniques in hate speech detection. The results provide valuable insights for future TAD research and guide the selection of suitable algorithms for detecting text anomalies in different contexts.

pdf bib abs

Human Value Detection from Bilingual Sensory Product Reviews
Boyu Niu | Céline Manetta | Frédérique Segond
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

We applied text classification methods on a corpus of product reviews we created with the help of a questionnaire. We found that for certain values, “traditional” deep neural networks like CNN can give promising results compared to the baseline. We propose some ideas to improve the results in the future. The bilingual corpus we created which contains more than 16 000 consumer reviews associated to the human value profile of the authors can be used for different marketing purposes.

2022

pdf bib abs

Annotation of Messages from Social Media for Influencer Detection
Kevin Deturck | Damien Nouvel | Namrata Patel | Frédérique Segond
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022

To develop an influencer detection system, we designed an influence model based on the analysis of conversations in the “Change My View” debate forum. This led us to identify enunciative features (argumentation, emotion expression, view change, ...) related to influence between participants. In this paper, we present the annotation campaign we conducted to build up a reference corpus on these enunciative features. The annotation task was to identify in social media posts the text segments that corresponded to each enunciative feature. The posts to be annotated were extracted from two social media: the “Change My View” debate forum, with discussions on various topics, and Twitter, with posts from users identified as supporters of ISIS (Islamic State of Iraq and Syria). Over a thousand posts have been double or triple annotated throughout five annotation sessions gathering a total of 27 annotators. Some of the sessions involved the same annotators, which allowed us to analyse the evolution of their annotation work. Most of the sessions resulted in a reconciliation phase between the annotators, allowing for discussion and iterative improvement of the guidelines. We measured and analysed inter-annotator agreements over the course of the sessions, which allowed us to validate our iterative approach.

pdf bib abs

Détection des influenceurs dans des médias sociaux par une approche hybride (Influencer detection in social media, a hybrid approach)
Kevin Deturck | Damien Nouvel | Namrata Patel | Frederique Segond
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

L’influence sociale est un phénomène important dans divers domaines, tels que l’économie et la politique, qui a gagné en résonnance avec la popularité des médias sociaux, notamment les réseaux sociaux et les forums. La majorité des travaux sur ce sujet propose des approches fondées sur des théories en sciences humaines (sociologie, linguistique), et des techniques d’analyse de réseau (mesures de propagation et de centralité) ou de TAL. Dans cet article, nous présentons un modèle d’influence inspiré de travaux en psychologie sociale, sur lequel nous construisons un système combinant un module de TAL pour détecter les messages reflétant les processus d’influence, associé à une analyse par centralité de la transmission de ces messages. Nos expériences sur le forum de débats Change My View montrent que l’approche par hybridation, comparée à la centralité seule, aide à mieux détecter les influenceurs.

pdf bib abs

Détection d’anomalies textuelles à base de l’ingénierie d’invite (Prompt Engineering-Based Text Anomaly Detection )
Yizhou Xu | Kata Gábor | Leila Khouas | Frédérique Segond
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

La détection d’anomalies textuelles est une tâche importante de la fouille de textes. Plusieurs approches générales, visant l’identification de points de données aberrants, ont été appliqués dans ce domaine. Néanmoins, ces approches exploitent peu les nouvelles avancées du traitement automatique des langues naturelles (TALN). L’avènement des modèles de langage pré-entraînés comme BERT et GPT-2 a donné naissance à un nouveau paradigme de l’apprentissage automatique appelé ingénierie d’invite (prompt engineering) qui a montré de bonnes performances sur plusieurs tâches du TALN. Cet article présente un travail exploratoire visant à examiner la possibilité de détecter des anomalies textuelles à l’aide de l’ingénierie d’invite. Dans nos expérimentations, nous avons examiné la performance de différents modèles d’invite. Les résultats ont montré que l’ingénierie d’invite est une méthode prometteuse pour la détection d’anomalies textuelles.

2016

pdf bib abs

Encoding Adjective Scales for Fine-grained Resources
Cédric Lopez | Frédérique Segond | Christiane Fellbaum
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We propose an automatic approach towards determining the relative location of adjectives on a common scale based on their strength. We focus on adjectives expressing different degrees of goodness occurring in French product (perfumes) reviews. Using morphosyntactic patterns, we extract from the reviews short phrases consisting of a noun that encodes a particular aspect of the perfume and an adjective modifying that noun. We then associate each such n-gram with the corresponding product aspect and its related star rating. Next, based on the star scores, we generate adjective scales reflecting the relative strength of specific adjectives associated with a shared attribute of the product. An automatic ordering of the adjectives “correct” (correct), “sympa” (nice), “bon” (good) and “excellent” (excellent) according to their score in our resource is consistent with an intuitive scale based on human judgments. Our long-term objective is to generate different adjective scales in an empirical manner, which could allow the enrichment of lexical resources.

pdf bib abs

Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs Machine Learning
Ioannis Partalas | Cédric Lopez | Frédérique Segond
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Posters)

Comparing Named-Entity Recognizers in a Targeted Domain : Handcrafted Rules vs. Machine Learning Named-Entity Recognition concerns the classification of textual objects in a predefined set of categories such as persons, organizations, and localizations. While Named-Entity Recognition is well studied since 20 years, the application to specialized domains still poses challenges for current systems. We developed a rule-based system and two machine learning approaches to tackle the same task : recognition of product names, brand names, etc., in the domain of Cosmetics, for French. Our systems can thus be compared under ideal conditions. In this paper, we introduce both systems and we compare them.

2015

pdf bib abs

Un système expert fondé sur une analyse sémantique pour l’identification de menaces d’ordre biologique
Cédric Lopez | Aleksandra Ponomareva | Cécile Robin | André Bittar | Xabier Larrucea | Frédérique Segond | Marie-Hélène Metzger
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

Le projet européen TIER (Integrated strategy for CBRN – Chemical, Biological, Radiological and Nuclear – Threat Identification and Emergency Response) vise à intégrer une stratégie complète et intégrée pour la réponse d’urgence dans un contexte de dangers biologiques, chimiques, radiologiques, nucléaires, ou liés aux explosifs, basée sur l’identification des menaces et d’évaluation des risques. Dans cet article, nous nous focalisons sur les risques biologiques. Nous présentons notre système expert fondé sur une analyse sémantique, permettant l’extraction de données structurées à partir de données non structurées dans le but de raisonner.

2014

pdf bib abs

Generating a Resource for Products and Brandnames Recognition. Application to the Cosmetic Domain.
Cédric Lopez | Frédérique Segond | Olivier Hondermarck | Paolo Curtoni | Luca Dini
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Named Entity Recognition task needs high-quality and large-scale resources. In this paper, we present RENCO, a based-rules system focused on the recognition of entities in the Cosmetic domain (brandnames, product names, â¦). RENCO has two main objectives: 1) Generating resources for named entity recognition; 2) Mining new named entities relying on the previous generated resources. In order to build lexical resources for the cosmetic domain, we propose a system based on local lexico-syntactic rules complemented by a learning module. As the outcome of the system, we generate both a simple lexicon and a structured lexicon. Results of the evaluation show that even if RENCO outperforms a classic Conditional Random Fields algorithm, both systems should combine their respective strengths.

2012

pdf bib

pdf bib

Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Frédérique Segond
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib abs

Développement d’un système de détection des infections associées aux soins à partir de l’analyse de comptes-rendus d’hospitalisation (Development of a system that detects occurrences of healthcare-associated infections from the analysis of hospitalization reports)
Caroline Hagège | Denys Proux | Quentin Gicquel | Stéfan Darmoni | Suzanne Pereira | Frédérique Segond | Marie-Helène Metzger
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article décrit la première version et les résultats de l’évaluation d’un système de détection des épisodes d’infections associées aux soins. Cette détection est basée sur l’analyse automatique de comptes-rendus d’hospitalisation provenant de différents hôpitaux et différents services. Ces comptes-rendus sont sous forme de texte libre. Le système de détection a été développé à partir d’un analyseur linguistique que nous avons adapté au domaine médical et extrait à partir des documents des indices pouvant conduire à une suspicion d’infection. Un traitement de la négation et un traitement temporel des textes sont effectués permettant de restreindre et de raffiner l’extraction d’indices. Nous décrivons dans cet article le système que nous avons développé et donnons les résultats d’une évaluation préliminaire.

pdf bib

Architecture and Systems for Monitoring Hospital Acquired Infections inside Hospital Information Workflows
Denys Proux | Caroline Hagège | Quentin Gicquel | Suzanne Pereira | Stefan Darmoni | Frédérique Segond | Marie-Hélène Metzger
Proceedings of the Second Workshop on Biomedical Natural Language Processing