Yannick Toussaint


Organizing and Improving a Database of French Word Formation Using Formal Concept Analysis
Nyoman Juniarta | Olivier Bonami | Nabil Hathout | Fiammetta Namer | Yannick Toussaint
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We apply Formal Concept Analysis (FCA) to organize and to improve the quality of Démonette2, a French derivational database, through a detection of both missing and spurious derivations in the database. We represent each derivational family as a graph. Given that the subgraph relation exists among derivational families, FCA can group families and represent them in a partially ordered set (poset). This poset is also useful for improving the database. A family is regarded as a possible anomaly (meaning that it may have missing and/or spurious derivations) if its derivational graph is almost, but not completely identical to a large number of other families.


Do sentence embeddings capture discourse properties of sentences from Scientific Abstracts ?
Laurine Huber | Chaker Memmadi | Mathilde Dargnat | Yannick Toussaint
Proceedings of the First Workshop on Computational Approaches to Discourse

We introduce four tasks designed to determine which sentence encoders best capture discourse properties of sentences from scientific abstracts, namely coherence and cohesion between clauses of a sentence, and discourse relations within sentences. We show that even if contextual encoders such as BERT or SciBERT encodes the coherence in discourse units, they do not help to predict three discourse relations commonly used in scientific abstracts. We discuss what these results underline, namely that these discourse relations are based on particular phrasing that allow non-contextual encoders to perform well.


Aligning Discourse and Argumentation Structures using Subtrees and Redescription Mining
Laurine Huber | Yannick Toussaint | Charlotte Roze | Mathilde Dargnat | Chloé Braud
Proceedings of the 6th Workshop on Argument Mining

In this paper, we investigate similarities between discourse and argumentation structures by aligning subtrees in a corpus containing both annotations. Contrary to previous works, we focus on comparing sub-structures and not only relations matches. Using data mining techniques, we show that discourse and argumentation most often align well, and the double annotation allows to derive a mapping between structures. Moreover, this approach enables the study of similarities between discourse structures and differences in their expressive power.


Syntax-based Transfer Learning for the Task of Biomedical Relation Extraction
Joël Legrand | Yannick Toussaint | Chedy Raïssi | Adrien Coulet
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

Transfer learning (TL) proposes to enhance machine learning performance on a problem, by reusing labeled data originally designed for a related problem. In particular, domain adaptation consists, for a specific task, in reusing training data developed for the same task but a distinct domain. This is particularly relevant to the applications of deep learning in Natural Language Processing, because those usually require large annotated corpora that may not exist for the targeted domain, but exist for side domains. In this paper, we experiment with TL for the task of Relation Extraction (RE) from biomedical texts, using the TreeLSTM model. We empirically show the impact of TreeLSTM alone and with domain adaptation by obtaining better performances than the state of the art on two biomedical RE tasks and equal performances for two others, for which few annotated data are available. Furthermore, we propose an analysis of the role that syntactic features may play in TL for RE.


Ambiguity Diagnosis for Terms in Digital Humanities
Béatrice Daille | Evelyne Jacquey | Gaël Lejeune | Luis Felipe Melo | Yannick Toussaint
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Among all researches dedicating to terminology and word sense disambiguation, little attention has been devoted to the ambiguity of term occurrences. If a lexical unit is indeed a term of the domain, it is not true, even in a specialised corpus, that all its occurrences are terminological. Some occurrences are terminological and other are not. Thus, a global decision at the corpus level about the terminological status of all occurrences of a lexical unit would then be erroneous. In this paper, we propose three original methods to characterise the ambiguity of term occurrences in the domain of social sciences for French. These methods differently model the context of the term occurrences: one is relying on text mining, the second is based on textometry, and the last one focuses on text genre properties. The experimental results show the potential of the proposed approaches and give an opportunity to discuss about their hybridisation.


Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs
Mohsen Hassan | Olfa Makkaoui | Adrien Coulet | Yannick Toussaint
Proceedings of BioNLP 15


Le traitement automatique de la langue contre les erreurs judiciaires : une méthodologie d’analyse systématique des textes d’un dossier d’instruction
Yannick Toussaint
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article présente une méthode d’analyse systématique et scientifique des documents constituant un dossier d’instruction. L’objectif de cette approche est de pouvoir donner au juge d’instruction de nouveaux moyens pour évaluer la cohérence, les incohérences, la stabilité ou les variations dans les témoignages. Cela doit lui permettre de définir des pistes pour mener de nouvelles investigations. Nous décrivons les travaux que nous avons réalisés sur un dossier réel puis nous proposons une méthode d’analyse des résultats.