Philippe Muller

2025

pdf bib abs
Supervision faible pour la classification des relations discursives
Khalil Maachou | Chloé Braud | Philippe Muller
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

L’identification des relations discursives est importante pour comprendre les liens sémantiques qui structurent un texte, mais cette tâche souffre d’un manque de données qui limite les performances. D’un autre côté, de nombreux corpus discursifs existent : les divergences entre les projets d’annotation empêchent cependant de combiner directement ces jeux de données à l’entraînement. Nous proposons de résoudre ce problème en exploitant le cadre de la supervision faible, dont l’objectif est de générer des annotations à partir de sources variées, comme des heuristiques ou des modèles pré-entraînés. Ces annotations bruitées et partielles sont ensuite combinées pour entraîner un modèle sur la tâche. En combinant cette méthode avec des stratégies permettant de gérer les différences dans les jeux d’étiquettes, nous démontrons qu’il est possible d’obtenir des performances proches d’un système entièrement supervisé en s’appuyant sur une très petite partie des données d’origine, ouvrant ainsi des perspectives d’amélioration pour des domaines ou des langages à faibles ressources.

2024

pdf bib abs
Feature-augmented model for multilingual discourse relation classification
Eleni Metheniti | Chloé Braud | Philippe Muller
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)

Discourse relation classification within a multilingual, cross-framework setting is a challenging task, and the best-performing systems so far have relied on monolingual and mono-framework approaches.In this paper, we introduce transformer-based multilingual models, trained jointly over all datasets—thus covering different languages and discourse frameworks. We demonstrate their ability to outperform single-corpus models and to overcome (to some extent) the disparity among corpora, by relying on linguistic features and generic information about the nature of the datasets. We also compare the performance of different multilingual pretrained models, as well as the encoding of the relation direction, a key component for the task. Our results on the 16 datasets of the DISRPT 2021 benchmark show improvements in accuracy in (almost) all datasets compared to the monolingual models, with at best 65.91% in average accuracy, thus corresponding to a 4% improvement over the state-of-the-art.

pdf bib abs
Complex question generation using discourse-based data augmentation
Khushnur Jahangir | Philippe Muller | Chloé Braud
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)

Question Generation (QG), the process of generating meaningful questions from a given context, has proven to be useful for several tasks such as question answering or FAQ generation. While most existing QG techniques generate simple, fact-based questions, this research aims to generate questions that can have complex answers (e.g. “why” questions). We propose a data augmentation method that uses discourse relations to create such questions, and experiment on existing English data. Our approach generates questions based solely on the context without answer supervision, in order to enhance question diversity and complexity. We use an encoder-decoder trained on the augmented dataset to generate either one question or multiple questions at a time, and show that the latter improves over the baseline model when doing a human quality evaluation, without degrading performance according to standard automated metrics.

pdf bib abs
In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models
Ayrton San Joaquin | Bin Wang | Zhengyuan Liu | Nicholas Asher | Brian Lim | Philippe Muller | Nancy F. Chen
Findings of the Association for Computational Linguistics: EMNLP 2024

Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the open-source community. To address this challenge, we propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and evaluation samples with a trained model. Notably, we assess the model’s internal gradients to estimate this relationship, aiming to rank the contribution of each training point. To enhance efficiency, we propose an optimization to compute influence functions with a reduced number of layers while achieving similar accuracy. By applying our algorithm to instruction fine-tuning data of LLMs, we can achieve similar performance with just 50% of the training data. Meantime, using influence functions to analyze model coverage to certain testing samples could provide a reliable and interpretable signal on the training set’s coverage of those test points.

pdf bib abs
DISRPT: A Multilingual, Multi-domain, Cross-framework Benchmark for Discourse Processing
Chloé Braud | Amir Zeldes | Laura Rivière | Yang Janet Liu | Philippe Muller | Damien Sileo | Tatsuya Aoyama
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents DISRPT, a multilingual, multi-domain, and cross-framework benchmark dataset for discourse processing, covering the tasks of discourse unit segmentation, connective identification, and relation classification. DISRPT includes 13 languages, with data from 24 corpora covering about 4 millions tokens and around 250,000 discourse relation instances from 4 discourse frameworks: RST, SDRT, PDTB, and Discourse Dependencies. We present an overview of the data, its development across three NLP shared tasks on discourse processing carried out in the past five years, and the latest modifications and added extensions. We also carry out an evaluation of state-of-the-art multilingual systems trained on the data for each task, showing plateau performance on segmentation, but important room for improvement for connective identification and relation classification. The DISRPT benchmark employs a unified format that we make available on GitHub and HuggingFace in order to encourage future work on discourse processing across languages, domains, and frameworks.

pdf bib abs
Zero-shot Learning for Multilingual Discourse Relation Classification
Eleni Metheniti | Philippe Muller | Chloé Braud | Margarita Hernández Casas
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Classifying discourse relations is known as a hard task, relying on complex indices. On the other hand, discourse-annotated data is scarce, especially for languages other than English: many corpora, of limited size, exist for several languages but the domain is split between different theoretical frameworks that have a huge impact on the nature of the textual spans to be linked, and the label set used. Moreover, each annotation project implements modifications compared to the theoretical background and other projects. These discrepancies hinder the development of systems taking advantage of all the available data to tackle data sparsity and work on transfer between languages is very limited, almost nonexistent between frameworks, while it could improve our understanding of some theoretical aspects and enhance many applications. In this paper, we propose the first experiments on zero-shot learning for discourse relation classification and investigate several paths in the way source data can be combined, either based on languages, frameworks, or similarity measures. We demonstrate how difficult transfer is for the task at hand, and that the most impactful factor is label set divergence, where the notion of underlying framework possibly conceals crucial disagreements.

2023

pdf bib
Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023)
Chloé Braud | Yang Janet Liu | Eleni Metheniti | Philippe Muller | Laura Rivière | Attapol Rutherford | Amir Zeldes
Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023)

pdf bib abs
The DISRPT 2023 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification
Chloé Braud | Yang Janet Liu | Eleni Metheniti | Philippe Muller | Laura Rivière | Attapol Rutherford | Amir Zeldes
Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023)

In 2023, the third iteration of the DISRPT Shared Task (Discourse Relation Parsing and Treebanking) was held, dedicated to the underlying units used in discourse parsing across formalisms. Following the success of the 2019and 2021 tasks on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification, this iteration has added 10 new corpora, including 2 new languages (Thai and Italian) and 3 discourse treebanks annotated in the discourse dependency representation in addition to the previously included frameworks: RST, SDRT, and PDTB. In this paper, we review the data included in the Shared Task, which covers 26 datasets across 13 languages, survey and compare submitted systems, and report on system performance on each task for both annotated and plain-tokenized versions of the data.

pdf bib abs
DisCut and DiscReT: MELODI at DISRPT 2023
Eleni Metheniti | Chloé Braud | Philippe Muller | Laura Rivière
Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023)

This paper presents the results obtained by the MELODI team for the three tasks proposed within the DISRPT 2023 shared task on discourse: segmentation, connective identification, and relation classification. The competition involves corpora in various languages in several underlying frameworks, and proposes two tracks depending on the presence or not of annotations of sentence boundaries and syntactic information. For these three tasks, we rely on a transformer-based architecture, and investigate several optimizations of the models, including hyper-parameter search and layer freezing. For discourse relations, we also explore the use of adapters—a lightweight solution for model fine-tuning—and introduce relation mappings to partially deal with the label set explosion we are facing within the setting of the shared task in a multi-corpus perspective. In the end, we propose one single architecture for segmentation and connectives, based on XLM-RoBERTa large, freezed at lower layers, with new state-of-the-art results for segmentation, and we propose 3 different models for relations, since the task makes it harder to generalize across all corpora.

pdf bib abs
An Integrated Approach for Political Bias Prediction and Explanation Based on Discursive Structure
Nicolas Devatine | Philippe Muller | Chloé Braud
Findings of the Association for Computational Linguistics: ACL 2023

One crucial aspect of democracy is fair information sharing. While it is hard to prevent biases in news, they should be identified for better transparency. We propose an approach to automatically characterize biases that takes into account structural differences and that is efficient for long texts. This yields new ways to provide explanations for a textual classifier, going beyond mere lexical cues. We show that: (i) the use of discourse-based structure-aware document representations compare well to local, computationally heavy, or domain-specific models on classification tasks that deal with textual bias (ii) our approach based on different levels of granularity allows for the generation of better explanations of model decisions, both at the lexical and structural level, while addressing the challenge posed by long texts.

pdf bib abs
Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
Laurent Prevot | Julie Hunter | Philippe Muller
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

While discourse parsing has made considerable progress in recent years, discourse segmentation of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labelling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labelled data is enough to obtain good results (although significantly lower than in the first experiment using all the annotated data available).

pdf bib abs
MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles
Nicolas Devatine | Philippe Muller | Chloé Braud
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our approach to Subtask 1 “News Genre Categorization” of SemEval-2023 Task 3 “Detecting the Category, the Framing, and the Persuasion Techniques in Online News in a Multi-lingual Setup”, which aims to determine whether a given news article is an opinion piece, an objective report, or satirical. We fine-tuned the domain-specific language model POLITICS, which was pre-trained on a large-scale dataset of more than 3.6M English political news articles following ideology-driven pre-training objectives. In order to use it in the multilingual setup of the task, we added as a pre-processing step the translation of all documents into English. Our system ranked among the top systems overall in most language, and ranked 1st on the English dataset.

2022

pdf bib abs
Predicting Political Orientation in News with Latent Discourse Structure to Improve Bias Understanding
Nicolas Devatine | Philippe Muller | Chloé Braud
Proceedings of the 3rd Workshop on Computational Approaches to Discourse

With the growing number of information sources, the problem of media bias becomes worrying for a democratic society. This paper explores the task of predicting the political orientation of news articles, with a goal of analyzing how bias is expressed. We demonstrate that integrating rhetorical dimensions via latent structures over sub-sentential discourse units allows for large improvements, with a +7.4 points difference between the base LSTM model and its discourse-based version, and +3 points improvement over the previous BERT-based state-of-the-art model. We also argue that this gives a new relevant handle for analyzing political bias in news articles.

pdf bib abs
A Pragmatics-Centered Evaluation Framework for Natural Language Understanding
Damien Sileo | Philippe Muller | Tim Van de Cruys | Camille Pradel
Proceedings of the Thirteenth Language Resources and Evaluation Conference

New models for natural language understanding have recently made an unparalleled amount of progress, which has led some researchers to suggest that the models induce universal text representations. However, current benchmarks are predominantly targeting semantic phenomena; we make the case that pragmatics needs to take center stage in the evaluation of natural language understanding. We introduce PragmEval, a new benchmark for the evaluation of natural language understanding, that unites 11 pragmatics-focused evaluation datasets for English. PragmEval can be used as supplementary training data in a multi-task learning setup, and is publicly available, alongside the code for gathering and preprocessing the datasets. Using our evaluation suite, we show that natural language inference, a widely used pretraining task, does not result in genuinely universal representations, which presents a new challenge for multi-task learning.

2021

pdf bib
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)
Amir Zeldes | Yang Janet Liu | Mikel Iruskieta | Philippe Muller | Chloé Braud | Sonia Badene
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)

pdf bib abs
The DISRPT 2021 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification
Amir Zeldes | Yang Janet Liu | Mikel Iruskieta | Philippe Muller | Chloé Braud | Sonia Badene
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)

In 2021, we organized the second iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task (Discourse Relation Parsing and Treebanking). Adding to the 2019 tasks on Elementary Discourse Unit Segmentation and Connective Detection, this iteration of the Shared Task included for the first time a track on discourse relation classification across three formalisms: RST, SDRT, and PDTB. In this paper we review the data included in the Shared Task, which covers nearly 3 million manually annotated tokens from 16 datasets in 11 languages, survey and compare submitted systems and report on system performance on each task for both annotated and plain-tokenized versions of the data.

pdf bib abs
Multi-lingual Discourse Segmentation and Connective Identification: MELODI at Disrpt2021
Morteza Kamaladdini Ezzabady | Philippe Muller | Chloé Braud
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)

We present an approach for discourse segmentation and discourse connective identification, both at the sentence and document level, within the Disrpt 2021 shared task, a multi-lingual and multi-formalism evaluation campaign. Building on the most successful architecture from the 2019 similar shared task, we leverage datasets in the same or similar languages to augment training data and improve on the best systems from the previous campaign on 3 out of 4 subtasks, with a mean improvement on all 16 datasets of 0.85%. Within the Disrpt 21 campaign the system ranks 3rd overall, very close to the 2nd system, but with a significant gap with respect to the best system, which uses a rich set of additional features. The system is nonetheless the best on languages that benefited from crosslingual training on sentence internal segmentation (German and Spanish).

pdf bib abs
Weakly supervised discourse segmentation for multiparty oral conversations
Lila Gravellier | Julie Hunter | Philippe Muller | Thomas Pellegrini | Isabelle Ferrané
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Discourse segmentation, the first step of discourse analysis, has been shown to improve results for text summarization, translation and other NLP tasks. While segmentation models for written text tend to perform well, they are not directly applicable to spontaneous, oral conversation, which has linguistic features foreign to written text. Segmentation is less studied for this type of language, where annotated data is scarce, and existing corpora more heterogeneous. We develop a weak supervision approach to adapt, using minimal annotation, a state of the art discourse segmenter trained on written text to French conversation transcripts. Supervision is given by a latent model bootstrapped by manually defined heuristic rules that use linguistic and acoustic information. The resulting model improves the original segmenter, especially in contexts where information on speaker turns is lacking or noisy, gaining up to 13% in F-score. Evaluation is performed on data like those used to define our heuristic rules, but also on transcripts from two other corpora.

pdf bib abs
Plongements Interprétables pour la Détection de Biais Cachés (Interpretable Embeddings for Hidden Biases Detection)
Tom Bourgeade | Philippe Muller | Tim Van de Cruys
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

De nombreuses tâches sémantiques en TAL font usage de données collectées de manière semiautomatique, ce qui est souvent source d’artefacts indésirables qui peuvent affecter négativement les modèles entraînés sur celles-ci. Avec l’évolution plus récente vers des modèles à usage générique pré-entraînés plus complexes, et moins interprétables, ces biais peuvent conduire à l’intégration de corrélations indésirables dans des applications utilisateurs. Récemment, quelques méthodes ont été proposées pour entraîner des plongements de mots avec une meilleure interprétabilité. Nous proposons une méthode simple qui exploite ces représentations pour détecter de manière préventive des corrélations lexicales faciles à apprendre, dans divers jeux de données. Nous évaluons à cette fin quelques modèles de plongements interprétables populaires pour l’anglais, en utilisant à la fois une évaluation intrinsèque, et un ensemble de tâches sémantiques en aval, et nous utilisons la qualité interprétable des plongements afin de diagnostiquer des biais potentiels dans les jeux de données associés.

2020

pdf bib abs
DiscSense: Automated Semantic Analysis of Discourse Markers
Damien Sileo | Tim Van de Cruys | Camille Pradel | Philippe Muller
Proceedings of the Twelfth Language Resources and Evaluation Conference

Using a model trained to predict discourse markers between sentence pairs, we predict plausible markers between sentence pairs with a known semantic relation (provided by existing classification datasets). These predictions allow us to study the link between discourse markers and the semantic relations annotated in classification datasets. Handcrafted mappings have been proposed between markers and discourse relations on a limited set of markers and a limited set of categories, but there exists hundreds of discourse markers expressing a wide variety of relations, and there is no consensus on the taxonomy of relations between competing discourse theories (which are largely built in a top-down fashion). By using an automatic prediction method over existing semantically annotated datasets, we provide a bottom-up characterization of discourse markers in English. The resulting dataset, named DiscSense, is publicly available.

pdf bib
Traitement Automatique des Langues, Volume 61, Numéro 3 : Dialogue et systèmes de dialogue [Dialogue and dialogue systems]
Liesbeth Degand | Philippe Muller
Traitement Automatique des Langues, Volume 61, Numéro 3 : Dialogue et systèmes de dialogue [Dialogue and dialogue systems]

pdf bib
Introduction to the Special Issue on Dialogue and Dialogue Systems
Liesbeth Degand | Philippe Muller
Traitement Automatique des Langues, Volume 61, Numéro 3 : Dialogue et systèmes de dialogue [Dialogue and dialogue systems]

2019

pdf bib abs
Analyse faiblement supervisée de conversation en actes de dialogue (Weakly supervised dialog act analysis)
Catherine Thompson | Nicholas Asher | Philippe Muller | Jérémy Auguste
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Nous nous intéressons ici à l’analyse de conversation par chat dans un contexte orienté-tâche avec un conseiller technique s’adressant à un client, où l’objectif est d’étiqueter les énoncés en actes de dialogue, pour alimenter des analyses des conversations en aval. Nous proposons une méthode légèrement supervisée à partir d’heuristiques simples, de quelques annotations de développement, et une méthode d’ensemble sur ces règles qui sert à annoter automatiquement un corpus plus large de façon bruitée qui peut servir d’entrainement à un modèle supervisé. Nous comparons cette approche à une approche supervisée classique et montrons qu’elle atteint des résultats très proches, à un coût moindre et tout en étant plus facile à adapter à de nouvelles données.

pdf bib abs
Représentation sémantique distributionnelle et alignement de conversations par chat (Distributional semantic representation and alignment of online chat conversations )
Tom Bourgeade | Philippe Muller
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Les mesures de similarité textuelle ont une place importante en TAL, du fait de leurs nombreuses applications, en recherche d’information et en classification notamment. En revanche, le dialogue fait moins l’objet d’attention sur cette question. Nous nous intéressons ici à la production d’une similarité dans le contexte d’un corpus de conversations par chat à l’aide de méthodes non-supervisées, exploitant à différents niveaux la notion de sémantique distributionnelle, sous forme d’embeddings. Dans un même temps, pour enrichir la mesure, et permettre une meilleure interprétation des résultats, nous établissons des alignements explicites des tours de parole dans les conversations, en exploitant la distance de Wasserstein, qui permet de prendre en compte leur dimension structurelle. Enfin, nous évaluons notre approche à l’aide d’une tâche externe sur la petite partie annotée du corpus, et observons qu’elle donne de meilleurs résultats qu’une variante plus naïve à base de moyennes.

pdf bib abs
Aprentissage non-supervisé pour l’appariement et l’étiquetage de cas cliniques en français - DEFT2019 (Unsupervised learning for matching and labelling of French clinical cases - DEFT2019 )
Damien Sileo | Tim Van de Cruys | Philippe Muller | Camille Pradel
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Défi Fouille de Textes (atelier TALN-RECITAL)

Nous présentons le système utilisé par l’équipe Synapse/IRIT dans la compétition DEFT2019 portant sur deux tâches liées à des cas cliniques rédigés en français : l’une d’appariement entre des cas cliniques et des discussions, l’autre d’extraction de mots-clefs. Une des particularité est l’emploi d’apprentissage non-supervisé sur les deux tâches, sur un corpus construit spécifiquement pour le domaine médical en français

pdf bib abs
Mining Discourse Markers for Unsupervised Sentence Representation Learning
Damien Sileo | Tim Van De Cruys | Camille Pradel | Philippe Muller
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Current state of the art systems in NLP heavily rely on manually annotated datasets, which are expensive to construct. Very little work adequately exploits unannotated data – such as discourse markers between sentences – mainly because of data sparseness and ineffective extraction methods. In the present work, we propose a method to automatically discover sentence pairs with relevant discourse markers, and apply it to massive amounts of data. Our resulting dataset contains 174 discourse markers with at least 10k examples each, even for rare markers such as “coincidentally” or “amazingly”. We use the resulting data as supervision for learning transferable sentence embeddings. In addition, we show that even though sentence representation learning through prediction of discourse marker yields state of the art results across different transfer tasks, it’s not clear that our models made use of the semantic relation between sentences, thus leaving room for further improvements.

pdf bib abs
Composition of Sentence Embeddings: Lessons from Statistical Relational Learning
Damien Sileo | Tim Van De Cruys | Camille Pradel | Philippe Muller
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Various NLP problems – such as the prediction of sentence similarity, entailment, and discourse relations – are all instances of the same general task: the modeling of semantic relations between a pair of textual elements. A popular model for such problems is to embed sentences into fixed size vectors, and use composition functions (e.g. concatenation or sum) of those vectors as features for the prediction. At the same time, composition of embeddings has been a main focus within the field of Statistical Relational Learning (SRL) whose goal is to predict relations between entities (typically from knowledge base triples). In this article, we show that previous work on relation prediction between texts implicitly uses compositions from baseline SRL models. We show that such compositions are not expressive enough for several tasks (e.g. natural language inference). We build on recent SRL models to address textual relational problems, showing that they are more expressive, and can alleviate issues from simpler compositions. The resulting models significantly improve the state of the art in both transferable sentence representation learning and relation prediction.

pdf bib abs
ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents
Philippe Muller | Chloé Braud | Mathieu Morey
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

Segmentation is the first step in building practical discourse parsers, and is often neglected in discourse parsing studies. The goal is to identify the minimal spans of text to be linked by discourse relations, or to isolate explicit marking of discourse relations. Existing systems on English report F1 scores as high as 95%, but they generally assume gold sentence boundaries and are restricted to English newswire texts annotated within the RST framework. This article presents a generic approach and a system, ToNy, a discourse segmenter developed for the DisRPT shared task where multiple discourse representation schemes, languages and domains are represented. In our experiments, we found that a straightforward sequence prediction architecture with pretrained contextual embeddings is sufficient to reach performance levels comparable to existing systems, when separately trained on each corpus. We report performance between 81% and 96% in F1 score. We also observed that discourse segmentation models only display a moderate generalization capability, even within the same language and discourse representation scheme.

pdf bib abs
Which aspects of discourse relations are hard to learn? Primitive decomposition for discourse relation classification
Charlotte Roze | Chloé Braud | Philippe Muller
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Discourse relation classification has proven to be a hard task, with rather low performance on several corpora that notably differ on the relation set they use. We propose to decompose the task into smaller, mostly binary tasks corresponding to various primitive concepts encoded into the discourse relation definitions. More precisely, we translate the discourse relations into a set of values for attributes based on distinctions used in the mappings between discourse frameworks proposed by Sanders et al. (2018). This arguably allows for a more robust representation of discourse relations, and enables us to address usually ignored aspects of discourse relation prediction, namely multiple labels and underspecified annotations. We show experimentally which of the conceptual primitives are harder to learn from the Penn Discourse Treebank English corpus, and propose a correspondence to predict the original labels, with preliminary empirical comparisons with a direct model.

2018

pdf bib abs
Concaténation de réseaux de neurones pour la classification de tweets, DEFT2018 (Concatenation of neural networks for tweets classification, DEFT2018 )
Damien Sileo | Tim Van de Cruys | Philippe Muller | Camille Pradel
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

Nous présentons le système utilisé par l’équipe Melodi/Synapse Développement dans la compétition DEFT2018 portant sur la classification de thématique ou de sentiments de tweets en français. On propose un système unique pour les deux approches qui combine concaténativement deux méthodes d’embedding et trois modèles de représentation séquence. Le système se classe 1/13 en analyse de sentiments et 4/13 en classification thématique.

pdf bib abs
A Dependency Perspective on RST Discourse Parsing and Evaluation
Mathieu Morey | Philippe Muller | Nicholas Asher
Computational Linguistics, Volume 44, Issue 2 - June 2018

Computational text-level discourse analysis mostly happens within Rhetorical Structure Theory (RST), whose structures have classically been presented as constituency trees, and relies on data from the RST Discourse Treebank (RST-DT); as a result, the RST discourse parsing community has largely borrowed from the syntactic constituency parsing community. The standard evaluation procedure for RST discourse parsers is thus a simplified variant of PARSEVAL, and most RST discourse parsers use techniques that originated in syntactic constituency parsing. In this article, we isolate a number of conceptual and computational problems with the constituency hypothesis. We then examine the consequences, for the implementation and evaluation of RST discourse parsers, of adopting a dependency perspective on RST structures, a view advocated so far only by a few approaches to discourse parsing. While doing that, we show the importance of the notion of headedness of RST structures. We analyze RST discourse parsing as dependency parsing by adapting to RST a recent proposal in syntactic parsing that relies on head-ordered dependency trees, a representation isomorphic to headed constituency trees. We show how to convert the original trees from the RST corpus, RST-DT, and their binarized versions used by all existing RST parsers to head-ordered dependency trees. We also propose a way to convert existing simple dependency parser output to constituent trees. This allows us to evaluate and to compare approaches from both constituent-based and dependency-based perspectives in a unified framework, using constituency and dependency metrics. We thus propose an evaluation framework to compare extant approaches easily and uniformly, something the RST parsing community has lacked up to now. We can also compare parsers’ predictions to each other across frameworks. This allows us to characterize families of parsing strategies across the different frameworks, in particular with respect to the notion of headedness. Our experiments provide evidence for the conceptual similarities between dependency parsers and shift-reduce constituency parsers, and confirm that dependency parsing constitutes a viable approach to RST discourse parsing.

2017

pdf bib abs
Changement stylistique de phrases par apprentissage faiblement supervisé (Textual Style Transfer using Weakly Supervised Learning)
Damien Sileo | Camille Pradel | Philippe Muller | Tim Van de Cruys
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 - Articles courts

Plusieurs tâches en traitement du langage naturel impliquent de modifier des phrases en conservant au mieux leur sens, comme la reformulation, la compression, la simplification, chacune avec leurs propres données et modèles. Nous introduisons ici une méthode générale s’adressant à tous ces problèmes, utilisant des données plus simples à obtenir : un ensemble de phrases munies d’indicateurs sur leur style, comme des phrases et le type de sentiment qu’elles expriment. Cette méthode repose sur un modèle d’apprentissage de représentations non supervisé (un auto-encodeur variationnel), puis sur le changement des représentations apprises pour correspondre à un style donné. Le résultat est évalué qualitativement, puis quantitativement sur le jeu de données de compression de phrases Microsoft, avec des résultats encourageants.

pdf bib abs
How much progress have we made on RST discourse parsing? A replication study of recent results on the RST-DT
Mathieu Morey | Philippe Muller | Nicholas Asher
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This article evaluates purported progress over the past years in RST discourse parsing. Several studies report a relative error reduction of 24 to 51% on all metrics that authors attribute to the introduction of distributed representations of discourse units. We replicate the standard evaluation of 9 parsers, 5 of which use distributed representations, from 8 studies published between 2013 and 2017, using their predictions on the test set of the RST-DT. Our main finding is that most recently reported increases in RST discourse parser performance are an artefact of differences in implementations of the evaluation procedure. We evaluate all these parsers with the standard Parseval procedure to provide a more accurate picture of the actual RST discourse parsers performance in standard evaluation settings. Under this more stringent procedure, the gains attributable to distributed representations represent at most a 16% relative error reduction on fully-labelled structures.

2016

pdf bib abs
A Supervised Approach for Enriching the Relational Structure of Frame Semantics in FrameNet
Shafqat Mumtaz Virk | Philippe Muller | Juliette Conrath
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Frame semantics is a theory of linguistic meanings, and is considered to be a useful framework for shallow semantic analysis of natural language. FrameNet, which is based on frame semantics, is a popular lexical semantic resource. In addition to providing a set of core semantic frames and their frame elements, FrameNet also provides relations between those frames (hence providing a network of frames i.e. FrameNet). We address here the limited coverage of the network of conceptual relations between frames in FrameNet, which has previously been pointed out by others. We present a supervised model using rich features from three different sources: structural features from the existing FrameNet network, information from the WordNet relations between synsets projected into semantic frames, and corpus-collected lexical associations. We show large improvements over baselines consisting of each of the three groups of features in isolation. We then use this model to select frame pairs as candidate relations, and perform evaluation on a sample with good precision.

pdf bib abs
Corpus Annotation within the French FrameNet: a Domain-by-domain Methodology
Marianne Djemaa | Marie Candito | Philippe Muller | Laure Vieu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper reports on the development of a French FrameNet, within the ASFALDA project. While the first phase of the project focused on the development of a French set of frames and corresponding lexicon (Candito et al., 2014), this paper concentrates on the subsequent corpus annotation phase, which focused on four notional domains (commercial transactions, cognitive stances, causality and verbal communication). Given full coverage is not reachable for a relatively “new” FrameNet project, we advocate that focusing on specific notional domains allowed us to obtain full lexical coverage for the frames of these domains, while partially reflecting word sense ambiguities. Furthermore, as frames and roles were annotated on two French Treebanks (the French Treebank (Abeillé and Barrier, 2004) and the Sequoia Treebank (Candito and Seddah, 2012), we were able to extract a syntactico-semantic lexicon from the annotated frames. In the resource’s current status, there are 98 frames, 662 frame evoking words, 872 senses, and about 13000 annotated frames, with their semantic roles assigned to portions of text. The French FrameNet is freely available at alpage.inria.fr/asfalda.

pdf bib abs
A General Framework for the Annotation of Causality Based on FrameNet
Laure Vieu | Philippe Muller | Marie Candito | Marianne Djemaa
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present here a general set of semantic frames to annotate causal expressions, with a rich lexicon in French and an annotated corpus of about 5000 instances of causal lexical items with their corresponding semantic frames. The aim of our project is to have both the largest possible coverage of causal phenomena in French, across all parts of speech, and have it linked to a general semantic framework such as FN, to benefit in particular from the relations between other semantic frames, e.g., temporal ones or intentional ones, and the underlying upper lexical ontology that enable some forms of reasoning. This is part of the larger ASFALDA French FrameNet project, which focuses on a few different notional domains which are interesting in their own right (Djemma et al., 2016), including cognitive positions and communication frames. In the process of building the French lexicon and preparing the annotation of the corpus, we had to remodel some of the frames proposed in FN based on English data, with hopefully more precise frame definitions to facilitate human annotation. This includes semantic clarifications of frames and frame elements, redundancy elimination, and added coverage. The result is arguably a significant improvement of the treatment of causality in FN itself.

2014

pdf bib
Unsupervised extraction of semantic relations using discourse cues
Juliette Conrath | Stergos Afantenos | Nicholas Asher | Philippe Muller
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Unsupervised extraction of semantic relations (Extraction non supervisée de relations sémantiques lexicales) [in French]
Juliette Conrath | Stergos Afantenos | Nicholas Asher | Philippe Muller
Proceedings of TALN 2014 (Volume 1: Long Papers)

The Asfalda project aims to develop a French corpus with frame-based semantic annotations and automatic tools for shallow semantic analysis. We present the first part of the project: focusing on a set of notional domains, we delimited a subset of English frames, adapted them to French data when necessary, and developed the corresponding French lexicon. We believe that working domain by domain helped us to enforce the coherence of the resulting resource, and also has the advantage that, though the number of frames is limited (around a hundred), we obtain full coverage within a given domain.

pdf bib
Predicting the relevance of distributional semantic similarity with contextual information
Philippe Muller | Cécile Fabre | Clémentine Adam
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)
Cécile Fabre | Nabil Hathout | Lydia-Mai Ho-Dac | François Morlane-Hondère | Philippe Muller | Franck Sajous | Ludovic Tanguy | Tim Van de Cruys
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)

pdf bib
Presentation of the SemDis 2014 workshop: distributional semantics for two tasks - lexical substitution and exploration of specialized corpora (Présentation de l’atelier SemDis 2014 : sémantique distributionnelle pour la substitution lexicale et l’exploration de corpus spécialisés) [in French]
Cécile Fabre | Nabil Hathout | Lydia-Mai Ho-Dac | François Morlane-Hondère | Philippe Muller | Franck Sajous | Ludovic Tanguy | Tim Van de Cruys
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)

2013

pdf bib
Evaluer et améliorer une ressource distributionnelle: protocole d’annotation de liens sémantiques en contexte [Evaluating and improving a distributional resource: protocol for in-context annotation of semantic links]
Clémentine Adam | Cécile Fabre | Philippe Muller
Traitement Automatique des Langues, Volume 54, Numéro 1 : Varia [Varia]

pdf bib
MELODI: Semantic Similarity of Words and Compositional Phrases using Latent Vector Weighting
Tim Van de Cruys | Stergos Afantenos | Philippe Muller
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
MELODI: A Supervised Distributional Approach for Free Paraphrasing of Noun Compounds
Tim Van de Cruys | Stergos Afantenos | Philippe Muller
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Expressivity and comparison of models of discourse structure
Antoine Venant | Nicholas Asher | Philippe Muller | Pascal Denis | Stergos Afantenos
Proceedings of the SIGDIAL 2013 Conference

2012

pdf bib
Préface [Introduction to the special issue]
Interjeet Mani | Philippe Muller
Traitement Automatique des Langues, Volume 53, Numéro 2 : Traitement automatique des informations temporelles et spatiales en langage naturel [Automatic Processing for Temporal and Spatial Information in Natural Language]

pdf bib
Constrained Decoding for Text-Level Discourse Parsing
Philippe Muller | Stergos Afantenos | Pascal Denis | Nicholas Asher
Proceedings of COLING 2012

This paper describes the ANNODIS resource, a discourse-level annotated corpus for French. The corpus combines two perspectives on discourse: a bottom-up approach and a top-down approach. The bottom-up view incrementally builds a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed of texts that are diversified with respect to genre, length and type of discursive organisation. The methodology followed here involves an iterative design of annotation guidelines in order to reach satisfactory inter-annotator agreement levels. This allows us to raise a few issues relevant for the comparison of such complex objects as discourse structures. The corpus also serves as a source of empirical evidence for discourse theories. We present here two first analyses taking advantage of this new annotated corpus --one that tested hypotheses on constraints governing discourse structure, and another that studied the variations in composition and signalling of multi-level discourse structures.

2011

pdf bib abs
Comparaison d’une approche miroir et d’une approche distributionnelle pour l’extraction de mots sémantiquement reliés (Comparing a mirror approach and a distributional approach for extracting semantically related words)
Philippe Muller | Philippe Langlais
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans (Muller & Langlais, 2010), nous avons comparé une approche distributionnelle et une variante de l’approche miroir proposée par Dyvik (2002) sur une tâche d’extraction de synonymes à partir d’un corpus en français. Nous présentons ici une analyse plus fine des relations extraites automatiquement en nous intéressant cette fois-ci à la langue anglaise pour laquelle de plus amples ressources sont disponibles. Différentes façons d’évaluer notre approche corroborent le fait que l’approche miroir se comporte globalement mieux que l’approche distributionnelle décrite dans (Lin, 1998), une approche de référence dans le domaine.

2010

pdf bib abs
Une évaluation de l’impact des types de textes sur la tâche de segmentation thématique
Clémentine Adam | Philippe Muller | Cécile Fabre
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cette étude a pour but de contribuer à la définition des objectifs de la segmentation thématique (ST), en incitant à prendre en considération le paramètre du type de textes dans cette tâche. Notre hypothèse est que, si la ST est certes pertinente pour traiter certains textes dont l’organisation est bien thématique, elle n’est pas adaptée à la prise en compte d’autres modes d’organisation (temporelle, rhétorique), et ne peut pas être appliquée sans précaution à des textes tout-venants. En comparant les performances d’un système de ST sur deux corpus, à organisation thématique “forte” et “faible”, nous montrons que cette tâche est effectivement sensible à la nature des textes.

pdf bib
Comparaison de ressources lexicales pour l’extraction de synonymes
Philippe Muller | Philippe Langlais
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

pdf bib
Comparison of different algebras for inducing the temporal structure of texts
Pascal Denis | Philippe Muller
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib abs
Learning Recursive Segments for Discourse Parsing
Stergos Afantenos | Pascal Denis | Philippe Muller | Laurence Danlos
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Automatically detecting discourse segments is an important preliminary step towards full discourse parsing. Previous research on discourse segmentation have relied on the assumption that elementary discourse units (EDUs) in a document always form a linear sequence (i.e., they can never be nested). Unfortunately, this assumption turns out to be too strong, for some theories of discourse, like the ""Segmented Discourse Representation Theory"" or SDRT, allow for nested discourse units. In this paper, we present a simple approach to discourse segmentation that is able to produce nested EDUs. Our approach builds on standard multi-class classification techniques making use of a regularized maximum entropy model, combined with a simple repairing heuristic that enforces global coherence. Our system was developed and evaluated on the first round of annotations provided by the French Annodis project (an ongoing effort to create a discourse bank for French). Cross-validated on only 47 documents (1,445 EDUs), our system achieves encouraging performance results with an F-score of 73% for finding EDUs.

2009

Le projet ANNODIS vise la construction d’un corpus de textes annotés au niveau discursif ainsi que le développement d’outils pour l’annotation et l’exploitation de corpus. Les annotations adoptent deux points de vue complémentaires : une perspective ascendante part d’unités de discours minimales pour construire des structures complexes via un jeu de relations de discours ; une perspective descendante aborde le texte dans son entier et se base sur des indices pré-identifiés pour détecter des structures discursives de haut niveau. La construction du corpus est associée à la création de deux interfaces : la première assiste l’annotation manuelle des relations et structures discursives en permettant une visualisation du marquage issu des prétraitements ; une seconde sera destinée à l’exploitation des annotations. Nous présentons les modèles et protocoles d’annotation élaborés pour mettre en oeuvre, au travers de l’interface dédiée, la campagne d’annotation.

2008

pdf bib abs
Annotation d’expressions temporelles et d’événements en français
Gabriel Parent | Michel Gagnon | Philippe Muller
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous proposons une méthode pour identifier, dans un texte en français, l’ensemble des expressions adverbiales de localisation temporelle, ainsi que tous les verbes, noms et adjectifs dénotant une éventualité (événement ou état). Cette méthode, en plus d’identifier ces expressions, extrait certaines informations sémantiques : la valeur de la localisation temporelle selon la norme TimeML et le type des éventualités. Pour les expressions adverbiales de localisation temporelle, nous utilisons une cascade d’automates, alors que pour l’identification des événements et états nous avons recours à une analyse complète de la phrase. Nos résultats sont proches de travaux comparables sur l’anglais, en l’absence d’évaluation quantitative similaire sur le français.

pdf bib abs
Evaluation Metrics for Automatic Temporal Annotation of Texts
Xavier Tannier | Philippe Muller
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Recent years have seen increasing attention in temporal processing of texts as well as a lot of standardization effort of temporal information in natural language. A central part of this information lies in the temporal relations between events described in a text, when their precise times or dates are not known. Reliable human annotation of such information is difficult, and automatic comparisons must follow procedures beyond mere precision-recall of local pieces of information, since a coherent picture can only be considered at a global level. We address the problem of evaluation metrics of such information, aiming at fair comparisons between systems, by proposing some measures taking into account the globality of a text.

L’article présente une méthode de désambiguïsation dans laquelle le sens est déterminé en utilisant un dictionnaire. La méthode est basée sur un algorithme qui calcule une distance « sémantique » entre les mots du dictionnaire en prenant en compte la topologie complète du dictionnaire, vu comme un graphe sur ses entrées. Nous l’avons testée sur la désambiguïsation des définitions du dictionnaire elles-mêmes. L’article présente des résultats préliminaires, qui sont très encourageants pour une méthode ne nécessitant pas de corpus annoté.

pdf bib abs
Une méthode pour l’annotation de relations temporelles dans des textes et son évaluation
Philippe Muller | Xavier Tannier
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article traite de l’annotation automatique d’informations temporelles dans des textes et vise plus particulièrement les relations entre événements introduits par les verbes dans chaque clause. Si ce problème a mobilisé beaucoup de chercheurs sur le plan théorique, il reste en friche pour ce qui est de l’annotation automatique systématique (et son évaluation), même s’il existe des débuts de méthodologie pour faire réaliser la tâche par des humains. Nous proposons ici à la fois une méthode pour réaliser la tâche automatiquement et une manière de mesurer à quel degré l’objectif est atteint. Nous avons testé la faisabilité de ceci sur des dépêches d’agence avec des premiers résultats encourageants.

pdf bib
Annotating and measuring temporal relations in texts
Philippe Muller | Xavier Tannier
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Word Sense Disambiguation using a dictionary for sense similarity measure
Bruno Gaume | Nabil Hathout | Philippe Muller
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics