Joseph Le Roux

Also published as: Joseph Le Roux

2025

pdf bib abs
Bregman Conditional Random Fields: Sequence Labeling with Parallelizable Inference Algorithms
Caio Corro | Mathieu Lacroix | Joseph Le Roux
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a novel discriminative model for sequence labeling called Bregman conditional random fields (BCRF).Contrary to standard linear-chain conditional random fields,BCRF allows fast parallelizable inference algorithms based on iterative Bregman projections.We show how such models can be learned using Fenchel-Young losses, including extension for learning from partial labels.Experimentally, our approach delivers comparable results to CRF while being faster, and achieves better results in highly constrained settings compared to mean field, another parallelizable alternative.

pdf bib abs
Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement
Nicolas Floquet | Joseph Le Roux | Nadi Tomeh | Thierry Charnois
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

We propose a novel architecture for graph-based dependency parsing that explicitly constructs vectors, from which both arcs and labels are scored. Our method addresses key limitations of the standard two-pipeline approach by unifying arc scoring and labeling into a single network, reducing scalability issues caused by the information bottleneck and lack of parameter sharing. Additionally, our architecture overcomes limited arc interactions with transformer layers to efficiently simulate higher-order dependencies. Experiments on PTB and UD show that our model outperforms state-of-the-art parsers in both accuracy and efficiency.

2023

pdf bib abs
Attention sur les spans pour l’analyse syntaxique en constituants
Nicolas Floquet | Nadi Tomeh | Joseph Le Roux | Thierry Charnois
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 2 : travaux de recherche originaux -- articles courts

Nous présentons une extension aux analyseurs syntaxiques en constituants neuronaux modernes qui consiste à doter les constituants potentiels d’une représentation vectorielle affinée en fonction du contexte par plusieurs applications successives d’un module de type transformer efficace (pooling par attention puis transformation non-linéaire).Nous appliquons cette extension à l’analyseur CRF de Yu Zhang & Al.Expérimentalement, nous testons cette extension sur deux corpus (PTB et FTB) avec ou sans vecteurs de mots dynamiques: cette extension permet d’avoir un gain constant dans toutes les configurations.

2022

pdf bib abs
Higher-Order Dependency Parsing for Arc-Polynomial Score Functions via Gradient-Based Methods and Genetic Algorithm
Xudong Zhang | Joseph Le Roux | Thierry Charnois
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present a novel method for higher-order dependency parsing which takes advantage of the general form of score functions written as arc-polynomials, a general framework which encompasses common higher-order score functions, and includes new ones. This method is based on non-linear optimization techniques, namely coordinate ascent and genetic search where we iteratively update a candidate parse. Updates are formulated as gradient-based operations, and are efficiently computed by auto-differentiation libraries. Experiments show that this method obtains results matching the recent state-of-the-art second order parsers on three standard datasets.

pdf bib abs
Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with VAEs
Ghazi Felhi | Joseph Le Roux | Djamé Seddah
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose a generative model for text generation, which exhibits disentangled latent representations of syntax and semantics. Contrary to previous work, this model does not need syntactic information such as constituency parses, or semantic information such as paraphrase pairs. Our model relies solely on the inductive bias found in attention-based architectures such as Transformers. In the attention of Transformers, keys handle information selection while values specify what information is conveyed. Our model, dubbed QKVAE, uses Attention in its decoder to read latent variables where one latent variable infers keys while another infers values. We run experiments on latent representations and experiments on syntax/semantics transfer which show that QKVAE displays clear signs of disentangled syntax and semantics. We also show that our model displays competitive syntax transfer capabilities when compared to supervised models and that comparable supervised models need a fairly large amount of data (more than 50K samples) to outperform it on both syntactic and semantic transfer. The code for our experiments is publicly available.

pdf bib abs
AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization
Moussa Kamal Eddine | Nadi Tomeh | Nizar Habash | Joseph Le Roux | Michalis Vazirgiannis
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora. While most existing models focus on English, Arabic remains understudied. In this paper we propose AraBART, the first Arabic model in which the encoder and the decoder are pretrained end-to-end, based on BART. We show that AraBART achieves the best performance on multiple abstractive summarization datasets, outperforming strong baselines including a pretrained Arabic BERT-based model, multilingual BART, Arabic T5, and a multilingual T5 model. AraBART is publicly available.

2021

pdf bib abs
Challenging the Semi-Supervised VAE Framework for Text Classification
Ghazi Felhi | Joseph Le Roux | Djamé Seddah
Proceedings of the Second Workshop on Insights from Negative Results in NLP

Semi-Supervised Variational Autoencoders (SSVAEs) are widely used models for data efficient learning. In this paper, we question the adequacy of the standard design of sequence SSVAEs for the task of text classification as we exhibit two sources of overcomplexity for which we provide simplifications. These simplifications to SSVAEs preserve their theoretical soundness while providing a number of practical advantages in the semi-supervised setup where the result of training is a text classifier. These simplifications are the removal of (i) the Kullback-Liebler divergence from its objective and (ii) the fully unobserved latent variable from its probabilistic model. These changes relieve users from choosing a prior for their latent variables, make the model smaller and faster, and allow for a better flow of information into the latent variables. We compare the simplified versions to standard SSVAEs on 4 text classification tasks. On top of the above-mentioned simplification, experiments show a speed-up of 26%, while keeping equivalent classification scores. The code to reproduce our experiments is public.

pdf bib abs
Strength in Numbers: Averaging and Clustering Effects in Mixture of Experts for Graph-Based Dependency Parsing
Xudong Zhang | Joseph Le Roux | Thierry Charnois
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

We review two features of mixture of experts (MoE) models which we call averaging and clustering effects in the context of graph-based dependency parsers learned in a supervised probabilistic framework. Averaging corresponds to the ensemble combination of parsers and is responsible for variance reduction which helps stabilizing and improving parsing accuracy. Clustering describes the capacity of MoE models to give more credit to experts believed to be more accurate given an input. Although promising, this is difficult to achieve, especially without additional data. We design an experimental set-up to study the impact of these effects. Whereas averaging is always beneficial, clustering requires good initialization and stabilization techniques, but its advantages over mere averaging seem to eventually vanish when enough experts are present. As a by product, we show how this leads to state-of-the-art results on the PTB and the CoNLL09 Chinese treebank, with low variance across experiments.

2020

pdf bib abs
Multitask Easy-First Dependency Parsing: Exploiting Complementarities of Different Dependency Representations
Yash Kankanampati | Joseph Le Roux | Nadi Tomeh | Dima Taji | Nizar Habash
Proceedings of the 28th International Conference on Computational Linguistics

In this paper we present a parsing model for projective dependency trees which takes advantage of the existence of complementary dependency annotations which is the case in Arabic, with the availability of CATiB and UD treebanks. Our system performs syntactic parsing according to both annotation types jointly as a sequence of arc-creating operations, and partially created trees for one annotation are also available to the other as features for the score function. This method gives error reduction of 9.9% on CATiB and 6.1% on UD compared to a strong baseline, and ablation tests show that the main contribution of this reduction is given by sharing tree representation between tasks, and not simply sharing BiLSTM layers as is often performed in NLP multitask systems.

pdf bib abs
Calcul de similarité entre phrases : quelles mesures et quels descripteurs ? (Sentence Similarity : a study on similarity metrics with words and character strings )
Davide Buscaldi | Ghazi Felhi | Dhaou Ghoul | Joseph Le Roux | Gaël Lejeune | Xudong Zhang
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes

Cet article présente notre participation à l’édition 2020 du Défi Fouille de Textes DEFT 2020 et plus précisément aux deux tâches ayant trait à la similarité entre phrases. Dans notre travail nous nous sommes intéressé à deux questions : celle du choix de la mesure du similarité d’une part et celle du choix des opérandes sur lesquelles se porte la mesure de similarité. Nous avons notamment étudié la question de savoir s’il fallait utiliser des mots ou des chaînes de caractères (mots ou non-mots). Nous montrons d’une part que la similarité de Bray-Curtis peut être plus efficace et surtout plus stable que la similarité cosinus et d’autre part que le calcul de similarité sur des chaînes de caractères est plus efficace que le même calcul sur des mots.

2019

pdf bib abs
Indexation et appariements de documents cliniques pour le Deft 2019 (Indexing and pairing texts of the medical domain )
Davide Buscaldi | Dhaou Ghoul | Joseph Le Roux | Gaël Lejeune
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Défi Fouille de Textes (atelier TALN-RECITAL)

Dans cet article, nous présentons nos méthodes pour les tâches d’indexation et d’appariements du Défi Fouile de Textes (Deft) 2019. Pour la taĉhe d’indexation nous avons testé deux méthodes, une fondée sur l’appariemetn préalable des documents du jeu de tset avec les documents du jeu d’entraînement et une autre méthode fondée sur l’annotation terminologique. Ces méthodes ont malheureusement offert des résultats assez faible. Pour la tâche d’appariement, nous avons dévellopé une méthode sans apprentissage fondée sur des similarités de chaînes de caractères ainsi qu’une méthode exploitant des réseaux siamois. Là encore les résultats ont été plutôt décevant même si la méthode non supervisée atteint un score plutôt honorable pour une méthode non-supervisée : 62% .

pdf bib abs
Representation Learning and Dynamic Programming for Arc-Hybrid Parsing
Joseph Le Roux | Antoine Rozenknop | Mathieu Lacroix
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We present a new method for transition-based parsing where a solution is a pair made of a dependency tree and a derivation graph describing the construction of the former. From this representation we are able to derive an efficient parsing algorithm and design a neural network that learns vertex representations and arc scores. Experimentally, although we only train via local classifiers, our approach improves over previous arc-hybrid systems and reach state-of-the-art parsing accuracy.

2018

pdf bib abs
Modèles en Caractères pour la Détection de Polarité dans les Tweets (Character-level Models for Polarity Detection in Tweets )
Davide Buscaldi | Joseph Le Roux | Gaël Lejeune
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

Dans cet article, nous présentons notre contribution au Défi Fouille de Textes 2018 au travers de trois méthodes originales pour la classification thématique et la détection de polarité dans des tweets en français. Nous y avons ajouté un système de vote. Notre première méthode est fondée sur des lexiques (mots et emojis), les n-grammes de caractères et un classificateur à vaste marge (ou SVM). tandis que les deux autres sont des méthodes endogènes fondées sur l’extraction de caractéristiques au grain caractères : un modèle à mémoire à court-terme persistante (ou BiLSTM pour Bidirectionnal Long Short-Term Memory) et perceptron multi-couche d’une part et un modèle de séquences de caractères fermées fréquentes et classificateur SVM d’autre part. Le BiLSTM a produit de loin les meilleurs résultats puisqu’il a obtenu la première place sur la tâche 1, classification binaire de tweets selon qu’ils traitent ou non des transports, et la troisième place sur la tâche 2, classification de la polarité en 4 classes. Ce résultat est d’autant plus intéressant que la méthode proposée est faiblement paramétrique, totalement endogène et qu’elle n’implique aucun pré-traitement.

2017

pdf bib abs
Efficient Discontinuous Phrase-Structure Parsing via the Generalized Maximum Spanning Arborescence
Caio Corro | Joseph Le Roux | Mathieu Lacroix
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a new method for the joint task of tagging and non-projective dependency parsing. We demonstrate its usefulness with an application to discontinuous phrase-structure parsing where decoding lexicalized spines and syntactic derivations is performed jointly. The main contributions of this paper are (1) a reduction from joint tagging and non-projective dependency parsing to the Generalized Maximum Spanning Arborescence problem, and (2) a novel decoding algorithm for this problem through Lagrangian relaxation. We evaluate this model and obtain state-of-the-art results despite strong independence assumptions.

pdf bib
Transforming Dependency Structures to LTAG Derivation Trees
Caio Corro | Joseph Le Roux
Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms

2016

pdf bib
Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework
Matthieu Constant | Joseph Le Roux | Nadi Tomeh
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Dependency Parsing with Bounded Block Degree and Well-nestedness via Lagrangian Relaxation and Branch-and-Bound
Caio Corro | Joseph Le Roux | Mathieu Lacroix | Antoine Rozenknop | Roberto Wolfler Calvo
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

Nous présentons une architecture pour l’analyse syntaxique en deux étapes. Dans un premier temps un analyseur syntagmatique construit, pour chaque phrase, une liste d’analyses qui sont converties en arbres de dépendances. Ces arbres sont ensuite réévalués par un réordonnanceur discriminant. Cette méthode permet de prendre en compte des informations auxquelles l’analyseur n’a pas accès, en particulier des annotations fonctionnelles. Nous validons notre approche par une évaluation sur le corpus arboré de Paris 7. La seconde étape permet d’améliorer significativement la qualité des analyses retournées, quelle que soit la métrique utilisée.

pdf bib
MACAON An NLP Tool Suite for Processing Word Lattices
Alexis Nasr | Frédéric Béchet | Jean-François Rey | Benoît Favre | Joseph Le Roux
Proceedings of the ACL-HLT 2011 System Demonstrations

2010

pdf bib
Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French
Mohammed Attia | Jennifer Foster | Deirdre Hogan | Joseph Le Roux | Lamia Tounsi | Josef van Genabith
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

2009

pdf bib abs
Analyse déductive pour les grammaires d’interaction
Joseph Le Roux
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous proposons un algorithme d’analyse pour les grammaires d’interaction qui utilise le cadre formel de l’analyse déductive. Cette approche donne un point de vue nouveau sur ce problème puisque les méthodes précédentes réduisaient ce dernier à la réécriture de graphes et utilisaient des techniques de résolution de contraintes. D’autre part, cette présentation permet de décrire le processus de manière standard et d’exhiber les sources d’indéterminisme qui rendent ce problème difficile.

pdf bib
Deductive Parsing in Interaction Grammars
Joseph Le Roux
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

2008

pdf bib abs
Calculs d’unification sur les arbres de dérivation TAG
Sylvain Schmitz | Joseph Le Roux
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Nous définissons un formalisme, les grammaires rationnelles d’arbres avec traits, et une traduction des grammaires d’arbres adjoints avec traits vers ce nouveau formalisme. Cette traduction préserve les structures de dérivation de la grammaire d’origine en tenant compte de l’unification de traits. La construction peut être appliquée aux réalisateurs de surface qui se fondent sur les arbres de dérivation.

pdf bib
Feature Unification in TAG Derivation Trees
Sylvain Schmitz | Joseph Le Roux
Proceedings of the Ninth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+9)

2006

pdf bib
Modélisation de la coordination dans les grammaires d’interaction [Modeling coordination in interaction grammars]
Joseph Le Roux | Guy Perrier
Traitement Automatique des Langues, Volume 47, Numéro 3 : Varia [Varia]

pdf bib
XMG - An Expressive Formalism for Describing Tree-Based Grammars
Yannick Parmentier | Joseph Le Roux | Benoît Crabbé
Demonstrations

pdf bib
A Constraint Driven Metagrammar
Joseph Le Roux | Benoît Crabbé | Yannick Parmentier
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms

2005

pdf bib abs
XMG : un Compilateur de Méta-Grammaires Extensible
Denys Duchier | Joseph Le Roux | Yannick Parmentier
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous présentons un outil permettant de produire automatiquement des ressources linguistiques, en l’occurence des grammaires. Cet outil se caractérise par son extensibilité, tant du point de vue des formalismes grammaticaux supportés (grammaires d’arbres adjoints et grammaires d’interaction à l’heure actuelle), que de son architecture modulaire, qui facilite l’intégration de nouveaux modules ayant pour but de vérifier la validité des structures produites. En outre, cet outil offre un support adapté au développement de grammaires à portée sémantique.