Raphael Troncy

Also published as: Raphaël Troncy

2025

Peut-on faire confiance aux juges ? Validation de méthodes d’évaluation de la factualité par perturbation des réponses
Giovanni Gatti Pinheiro | Sarra Gharsallah | Adèle Robaldo | Mariia Tokareva | Ilyana Guendouz | Raphaël Troncy | Paolo Papotti | Pietro Michiardi
Actes de l'atelier Évaluation des modèles génératifs (LLM) et challenge 2025 (EvalLLM)

Évaluer la véracité des grands modèles de langage (LLMs) est essentiel pour de nombreuses applications. Cependant, nos outils d’évaluation sont-ils eux-mêmes fiables ? Malgré la prolifération des métriques de factualité, leur sensibilité et leur fiabilité restent peu étudiées. Cet article introduit un cadre de méta-évaluation qui teste systématiquement ces métriques en appliquant des corruptions contrôlées à des réponses de référence. Notre méthode génère des sorties classées selon des degrés connus de dégradation afin d’analyser comment les métriques capturent les variations subtiles de véracité. Nos expériences montrent que les méthodes disponibles dans les framework d’évaluation, telles que la métrique factual correctness de RAGAS, suivent mieux la dégradation que les approches de type LLM-as-judge. Nous proposons également une nouvelle variante de la métrique de factualité, à la fois compétitive et économique.

pdf bib abs

Automated Detection of Tropes In Short Texts
Alessandra Flaccavento | Youri Peskine | Paolo Papotti | Riccardo Torlone | Raphael Troncy
Proceedings of the 31st International Conference on Computational Linguistics

Tropes — recurring narrative elements like the “smoking gun” or the “veil of secrecy” — are often used in movies to convey familiar patterns. However, they also play a significant role in online communication about societal issues, where they can oversimplify complex matters and deteriorate public discourse. Recognizing these tropes can offer insights into the emotional manipulation and potential bias present in online discussions. This paper addresses the challenge of automatically detecting tropes in social media posts. We define the task, distinguish it from previous work, and create a ground-truth dataset of social media posts related to vaccines and immigration, manually labeled with tropes. Using this dataset, we develop a supervised machine learning technique for multi-label classification, fine-tune a model, and demonstrate its effectiveness experimentally. Our results show that tropes are common across domains and that fine-tuned models can detect them with high accuracy.

2024

pdf bib abs

EURECOM at SemEval-2024 Task 4: Hierarchical Loss and Model Ensembling in Detecting Persuasion Techniques
Youri Peskine | Raphael Troncy | Paolo Papotti
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes the submission of team EURECOM at SemEval-2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes. We only tackled the first sub-task, consisting of detecting 20 named persuasion techniques in the textual content of memes. We trained multiple BERT-based models (BERT, RoBERTa, BERT pre-trained on harmful detection) using different losses (Cross Entropy, Binary Cross Entropy, Focal Loss and a custom-made hierarchical loss). The best results were obtained by leveraging the hierarchical nature of the data, by outputting ancestor classes and with a hierarchical loss. Our final submission consist of an ensembling of our top-3 best models for each persuasion techniques. We obtain hierarchical F1 scores of 0.655 (English), 0.345 (Bulgarian), 0.442 (North Macedonian) and 0.178 (Arabic) on the test set.

2023

pdf bib abs

Large language models have recently risen in popularity due to their ability to perform many natural language tasks without requiring any fine-tuning. In this work, we focus on two novel ideas: (1) generating definitions from examples and using them for zero-shot classification, and (2) investigating how an LLM makes use of the definitions. We thoroughly analyze the performance of GPT-3 model for fine-grained multi-label conspiracy theory classification of tweets using zero-shot labeling. In doing so, we asses how to improve the labeling by providing minimal but meaningful context in the form of the definitions of the labels. We compare descriptive noun phrases, human-crafted definitions, introduce a new method to help the model generate definitions from examples, and propose a method to evaluate GPT-3’s understanding of the definitions. We demonstrate that improving definitions of class labels has a direct consequence on the downstream classification results.

pdf bib abs

D2KLab at SemEval-2023 Task 2: Leveraging T-NER to Develop a Fine-Tuned Multilingual Model for Complex Named Entity Recognition
Thibault Ehrhart | Julien Plu | Raphael Troncy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents D2KLab’s system used for the shared task of “Multilingual Complex Named Entity Recognition (MultiCoNER II)”, as part of SemEval 2023 Task 2. The system relies on a fine-tuned transformer based language model for extracting named entities. In addition to the architecture of the system, we discuss our results and observations.

2022

We present a benchmark in six European languages containing manually annotated information about olfactory situations and events following a FrameNet-like approach. The documents selection covers ten domains of interest to cultural historians in the olfactory domain and includes texts published between 1620 to 1920, allowing a diachronic analysis of smell descriptions. With this work, we aim to foster the development of olfactory information extraction approaches as well as the analysis of changes in smell descriptions over time.

2021

pdf bib abs

Computational fact-checking has gained a lot of traction in the machine learning and natural language processing communities. A plethora of solutions have been developed, but methods which leverage both structured and unstructured information to detect misinformation are of particular relevance. In this paper, we tackle the FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information) challenge which consists of an open source baseline system together with a benchmark dataset containing 87,026 verified claims. We extend this baseline model by improving the evidence retrieval module yielding the best evidence F1 score among the competitors in the challenge leaderboard while obtaining an overall FEVEROUS score of 0.20 (5th best ranked system).

pdf bib abs

Zero-Shot Information Extraction to Enhance a Knowledge Graph Describing Silk Textiles
Thomas Schleider | Raphael Troncy
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

The knowledge of the European silk textile production is a typical case for which the information collected is heterogeneous, spread across many museums and sparse since rarely complete. Knowledge Graphs for this cultural heritage domain, when being developed with appropriate ontologies and vocabularies, enable to integrate and reconcile this diverse information. However, many of these original museum records still have some metadata gaps. In this paper, we present a zero-shot learning approach that leverages the ConceptNet common sense knowledge graph to predict categorical metadata informing about the silk objects production. We compared the performance of our approach with traditional supervised deep learning-based methods that do require training data. We demonstrate promising and competitive performance for similar datasets and circumstances and the ability to predict sometimes more fine-grained information. Our results can be reproduced using the code and datasets published at https://github.com/silknow/ZSL-KG-silk.

pdf bib abs

Apples to Apples: A Systematic Evaluation of Topic Models
Ismail Harrando | Pasquale Lisena | Raphael Troncy
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

From statistical to neural models, a wide variety of topic modelling algorithms have been proposed in the literature. However, because of the diversity of datasets and metrics, there have not been many efforts to systematically compare their performance on the same benchmarks and under the same conditions. In this paper, we present a selection of 9 topic modelling techniques from the state of the art reflecting a diversity of approaches to the task, an overview of the different metrics used to compare their performance, and the challenges of conducting such a comparison. We empirically evaluate the performance of these models on different settings reflecting a variety of real-life conditions in terms of dataset size, number of topics, and distribution of topics, following identical preprocessing and evaluation processes. Using both metrics that rely on the intrinsic characteristics of the dataset (different coherence metrics), as well as external knowledge (word embeddings and ground-truth topic labels), our experiments reveal several shortcomings regarding the common practices in topic models evaluation.

2020

pdf bib abs

TOMODAPI: A Topic Modeling API to Train, Use and Compare Topic Models
Pasquale Lisena | Ismail Harrando | Oussama Kandakji | Raphael Troncy
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)

From LDA to neural models, different topic modeling approaches have been proposed in the literature. However, their suitability and performance is not easy to compare, particularly when the algorithms are being used in the wild on heterogeneous datasets. In this paper, we introduce ToModAPI (TOpic MOdeling API), a wrapper library to easily train, evaluate and infer using different topic modeling algorithms through a unified interface. The library is extensible and can be used in Python environments or through a Web API.

2018

pdf bib abs

This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top scoring system for both English-to-German and English-to-French, according to the automatic metrics for flickr18. Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective.

pdf bib

Sanaphor++: Combining Deep Neural Networks with Semantics for Coreference Resolution
Julien Plu | Roman Prokofyev | Alberto Tonon | Philippe Cudré-Mauroux | Djellel Eddine Difallah | Raphaël Troncy | Giuseppe Rizzo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs

JeuxDeLiens: Word Embeddings and Path-Based Similarity for Entity Linking using the French JeuxDeMots Lexical Semantic Network
Julien Plu | Kevin Cousot | Mathieu Lafourcade | Raphaël Troncy | Giuseppe Rizzo
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Entity linking systems typically rely on encyclopedic knowledge bases such as DBpedia or Freebase. In this paper, we use, instead, a French lexical-semantic network named JeuxDeMots to jointly type and link entities. Our approach combines word embeddings and a path-based similarity resulting in encouraging results over a set of documents from the French Le Monde newspaper.

2017

pdf bib abs

SentiME++ at SemEval-2017 Task 4: Stacking State-of-the-Art Classifiers to Enhance Sentiment Classification
Raphaël Troncy | Enrico Palumbo | Efstratios Sygkounas | Giuseppe Rizzo
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper, we describe the participation of the SentiME++ system to the SemEval 2017 Task 4A “Sentiment Analysis in Twitter” that aims to classify whether English tweets are of positive, neutral or negative sentiment. SentiME++ is an ensemble approach to sentiment analysis that leverages stacked generalization to automatically combine the predictions of five state-of-the-art sentiment classifiers. SentiME++ achieved officially 61.30% F1-score, ranking 12th out of 38 participants.

2016

pdf bib abs

Context-enhanced Adaptive Entity Linking
Filip Ilievski | Giuseppe Rizzo | Marieke van Erp | Julien Plu | Raphaël Troncy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

More and more knowledge bases are publicly available as linked data. Since these knowledge bases contain structured descriptions of real-world entities, they can be exploited by entity linking systems that anchor entity mentions from text to the most relevant resources describing those entities. In this paper, we investigate adaptation of the entity linking task using contextual knowledge. The key intuition is that entity linking can be customized depending on the textual content, as well as on the application that would make use of the extracted information. We present an adaptive approach that relies on contextual knowledge from text to enhance the performance of ADEL, a hybrid linguistic and graph-based entity linking system. We evaluate our approach on a domain-specific corpus consisting of annotated WikiNews articles.

2014

pdf bib abs

Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web
Giuseppe Rizzo | Marieke van Erp | Raphaël Troncy
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Named entity recognition and disambiguation are of primary importance for extracting information and for populating knowledge bases. Detecting and classifying named entities has traditionally been taken on by the natural language processing community, whilst linking of entities to external resources, such as those in DBpedia, has been tackled by the Semantic Web community. As these tasks are treated in different communities, there is as yet no oversight on the performance of these tasks combined. We present an approach that combines the state-of-the art from named entity recognition in the natural language processing domain and named entity linking from the semantic web community. We report on experiments and results to gain more insights into the strengths and limitations of current approaches on these tasks. Our approach relies on the numerous web extractors supported by the NERD framework, which we combine with a machine learning algorithm to optimize recognition and linking of named entities. We test our approach on four standard data sets that are composed of two diverse text types, namely newswire and microposts.

Raphael Troncy

2025

2024

2023

2022

2021

2020

2018

2017

2016

2014

2012

Co-authors

Venues