Pascal Denis


2022

pdf bib
Chop and Change: Anaphora Resolution in Instructional Cooking Videos
Cennet Oguz | Ivana Kruijff-Korbayova | Emmanuel Vincent | Pascal Denis | Josef van Genabith
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022

Linguistic ambiguities arising from changes in entities in action flows are a key challenge in instructional cooking videos. In particular, temporally evolving entities present rich and to date understudied challenges for anaphora resolution. For example “oil” mixed with “salt” is later referred to as a “mixture”. In this paper we propose novel annotation guidelines to annotate recipes for the anaphora resolution task, reflecting change in entities. Moreover, we present experimental results for end-to-end multimodal anaphora resolution with the new annotation scheme and propose the use of temporal features for performance improvement.

2021

pdf bib
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale
Pascal Denis | Natalia Grabar | Amel Fraisse | Rémi Cardon | Bernard Jacquemin | Eric Kergosien | Antonio Balvet
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

pdf bib
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 : 23e REncontres jeunes Chercheurs en Informatique pour le TAL (RECITAL)
Pascal Denis | Natalia Grabar | Amel Fraisse | Rémi Cardon | Bernard Jacquemin | Eric Kergosien | Antonio Balvet
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 : 23e REncontres jeunes Chercheurs en Informatique pour le TAL (RECITAL)

pdf bib
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 3 : Démonstrations
Pascal Denis | Natalia Grabar | Amel Fraisse | Rémi Cardon | Bernard Jacquemin | Eric Kergosien | Antonio Balvet
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 3 : Démonstrations

pdf bib
An End-to-End Approach for Full Bridging Resolution
Joseph Renner | Priyansh Trivedi | Gaurav Maheshwari | Rémi Gilleron | Pascal Denis
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

In this article, we describe our submission to the CODI-CRAC 2021 Shared Task on Anaphora Resolution in Dialogues – Track BR (Gold). We demonstrate the performance of an end-to-end transformer-based higher-order coreference model finetuned for the task of full bridging. We find that while our approach is not effective at modeling the complexities of the task, it performs well on bridging resolution, suggesting a need for investigations into a robust anaphor identification model for future improvements.

2020

pdf bib
Joint Learning of the Graph and the Data Representation for Graph-Based Semi-Supervised Learning
Mariana Vargas-Vieyra | Aurélien Bellet | Pascal Denis
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)

Graph-based semi-supervised learning is appealing when labels are scarce but large amounts of unlabeled data are available. These methods typically use a heuristic strategy to construct the graph based on some fixed data representation, independently of the available labels. In this pa- per, we propose to jointly learn a data representation and a graph from both labeled and unlabeled data such that (i) the learned representation indirectly encodes the label information injected into the graph, and (ii) the graph provides a smooth topology with respect to the transformed data. Plugging the resulting graph and representation into existing graph-based semi-supervised learn- ing algorithms like label spreading and graph convolutional networks, we show that our approach outperforms standard graph construction methods on both synthetic data and real datasets.

pdf bib
Integrating knowledge graph embeddings to improve mention representation for bridging anaphora resolution
Onkar Pandit | Pascal Denis | Liva Ralaivola
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

Lexical semantics and world knowledge are crucial for interpreting bridging anaphora. Yet, existing computational methods for acquiring and injecting this type of information into bridging resolution systems suffer important limitations. Based on explicit querying of external knowledge bases, earlier approaches are computationally expensive (hence, hardly scalable) and they map the data to be processed into high-dimensional spaces (careful handling of the curse of dimensionality and overfitting has to be in order). In this work, we take a different and principled approach which naturally addresses these issues. Specifically, we convert the external knowledge source (in this case, WordNet) into a graph, and learn embeddings of the graph nodes of low dimension to capture the crucial features of the graph topology and, at the same time, rich semantic information. Once properly identified from the mention text spans, these low dimensional graph node embeddings are combined with distributional text-based embeddings to provide enhanced mention representations. We illustrate the effectiveness of our approach by evaluating it on commonly used datasets, namely ISNotes and BASHI. Our enhanced mention representations yield significant accuracy improvements on both datasets when compared to different standalone text-based mention representations.

2019

pdf bib
Phylogenic Multi-Lingual Dependency Parsing
Mathieu Dehouck | Pascal Denis
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Languages evolve and diverge over time. Their evolutionary history is often depicted in the shape of a phylogenetic tree. Assuming parsing models are representations of their languages grammars, their evolution should follow a structure similar to that of the phylogenetic tree. In this paper, drawing inspiration from multi-task learning, we make use of the phylogenetic tree to guide the learning of multi-lingual dependency parsers leveraging languages structural similarities. Experiments on data from the Universal Dependency project show that phylogenetic training is beneficial to low resourced languages and to well furnished languages families. As a side product of phylogenetic training, our model is able to perform zero-shot parsing of previously unseen languages.

2018

pdf bib
A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images
Melissa Ailem | Bowen Zhang | Aurelien Bellet | Pascal Denis | Fei Sha
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features extracted from convolutional neural networks), or use the latter to bias the learning of textual word embeddings. In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. Extensive experimental studies validate the proposed model. Concretely, on the tasks of assessing pairwise word similarity and image/caption retrieval, our approach attains equally competitive or stronger results when compared to other state-of-the-art multimodal models.

pdf bib
A Framework for Understanding the Role of Morphology in Universal Dependency Parsing
Mathieu Dehouck | Pascal Denis
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper presents a simple framework for characterizing morphological complexity and how it encodes syntactic information. In particular, we propose a new measure of morpho-syntactic complexity in terms of governor-dependent preferential attachment that explains parsing performance. Through experiments on dependency parsing with data from Universal Dependencies (UD), we show that representations derived from morphological attributes deliver important parsing performance improvements over standard word form embeddings when trained on the same datasets. We also show that the new morpho-syntactic complexity measure is predictive of the gains provided by using morphological attributes over plain forms on parsing scores, making it a tool to distinguish languages using morphology as a syntactic marker from others.

2017

pdf bib
Delexicalized Word Embeddings for Cross-lingual Dependency Parsing
Mathieu Dehouck | Pascal Denis
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

This paper presents a new approach to the problem of cross-lingual dependency parsing, aiming at leveraging training data from different source languages to learn a parser in a target language. Specifically, this approach first constructs word vector representations that exploit structural (i.e., dependency-based) contexts but only considering the morpho-syntactic information associated with each word and its contexts. These delexicalized word embeddings, which can be trained on any set of languages and capture features shared across languages, are then used in combination with standard language-specific features to train a lexicalized parser in the target language. We evaluate our approach through experiments on a set of eight different languages that are part the Universal Dependencies Project. Our main results show that using such delexicalized embeddings, either trained in a monolingual or multilingual fashion, achieves significant improvements over monolingual baselines.

pdf bib
Online Learning of Task-specific Word Representations with a Joint Biconvex Passive-Aggressive Algorithm
Pascal Denis | Liva Ralaivola
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

This paper presents a new, efficient method for learning task-specific word vectors using a variant of the Passive-Aggressive algorithm. Specifically, this algorithm learns a word embedding matrix in tandem with the classifier parameters in an online fashion, solving a bi-convex constrained optimization at each iteration. We provide a theoretical analysis of this new algorithm in terms of regret bounds, and evaluate it on both synthetic data and NLP classification problems, including text classification and sentiment analysis. In the latter case, we compare various pre-trained word vectors to initialize our word embedding matrix, and show that the matrix learned by our algorithm vastly outperforms the initial matrix, with performance results comparable or above the state-of-the-art on these tasks.

2016

pdf bib
Learning Connective-based Word Representations for Implicit Discourse Relation Identification
Chloé Braud | Pascal Denis
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Comparing Word Representations for Implicit Discourse Relation Classification
Chloé Braud | Pascal Denis
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Combining Natural and Artificial Examples to Improve Implicit Discourse Relation Identification
Chloé Braud | Pascal Denis
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Expressivity and comparison of models of discourse structure
Antoine Venant | Nicholas Asher | Philippe Muller | Pascal Denis | Stergos Afantenos
Proceedings of the SIGDIAL 2013 Conference

pdf bib
Automatically identifying implicit discourse relations using annotated data and raw corpora (Identification automatique des relations discursives « implicites » à partir de données annotées et de corpus bruts) [in French]
Chloé Braud | Pascal Denis
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf bib
Learning a hierarchy of specialized pairwise models for coreference resolution (Apprentissage d’une hiérarchie de modèles à paires spécialisés pour la résolution de la coréférence) [in French]
Emmanuel Lassalle | Pascal Denis
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf bib
Improving pairwise coreference models through feature space hierarchy learning
Emmanuel Lassalle | Pascal Denis
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Constrained Decoding for Text-Level Discourse Parsing
Philippe Muller | Stergos Afantenos | Pascal Denis | Nicholas Asher
Proceedings of COLING 2012

2011

pdf bib
French TimeBank : un corpus de référence sur la temporalité en français (French TimeBank: a reference corpus on temporality in French)
André Bittar | Pascal Amsili | Pascal Denis
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article a un double objectif : d’une part, il s’agit de présenter à la communauté un corpus récemment rendu public, le French Time Bank (FTiB), qui consiste en une collection de textes journalistiques annotés pour les temps et les événements selon la norme ISO-TimeML ; d’autre part, nous souhaitons livrer les résultats et réflexions méthodologiques que nous avons pu tirer de la réalisation de ce corpus de référence, avec l’idée que notre expérience pourra s’avérer profitable au-delà de la communauté intéressée par le traitement de la temporalité.

pdf bib
FreDist : Construction automatique d’un thésaurus distributionnel pour le Français (FreDist : Automatic construction of distributional thesauri for French)
Enrique Henestroza Anguiano | Pascal Denis
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article, nous présentons FreDist, un logiciel libre pour la construction automatique de thésaurus distributionnels à partir de corpus de texte, ainsi qu’une évaluation des différents ressources ainsi produites. Suivant les travaux de (Lin, 1998) et (Curran, 2004), nous utilisons un corpus journalistique de grande taille et implémentons différentes options pour : le type de relation contexte lexical, la fonction de poids, et la fonction de mesure de similarité. Prenant l’EuroWordNet français et le WOLF comme références, notre évaluation révèle, de manière originale, que c’est l’approche qui combine contextes linéaires (ici, de type bigrammes) et contextes syntaxiques qui semble fournir le meilleur thésaurus. Enfin, nous espérons que notre logiciel, distribué avec nos meilleurs thésaurus pour le français, seront utiles à la communauté TAL.

pdf bib
French TimeBank: An ISO-TimeML Annotated Reference Corpus
André Bittar | Pascal Amsili | Pascal Denis | Laurence Danlos
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Comparison of different algebras for inducing the temporal structure of texts
Pascal Denis | Philippe Muller
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Benchmarking of Statistical Dependency Parsers for French
Marie Candito | Joakim Nivre | Pascal Denis | Enrique Henestroza Anguiano
Coling 2010: Posters

pdf bib
Statistical French Dependency Parsing: Treebank Conversion and First Results
Marie Candito | Benoît Crabbé | Pascal Denis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We first describe the automatic conversion of the French Treebank (Abeillé and Barrier, 2004), a constituency treebank, into typed projective dependency trees. In order to evaluate the overall quality of the resulting dependency treebank, and to quantify the cases where the projectivity constraint leads to wrong dependencies, we compare a subset of the converted treebank to manually validated dependency trees. We then compare the performance of two treebank-trained parsers that output typed dependency parses. The first parser is the MST parser (Mcdonald et al., 2006), which we directly train on dependency trees. The second parser is a combination of the Berkeley parser (Petrov et al., 2006) and a functional role labeler: trained on the original constituency treebank, the Berkeley parser first outputs constituency trees, which are then labeled with functional roles, and then converted into dependency trees. We found that used in combination with a high-accuracy French POS tagger, the MST parser performs a little better for unlabeled dependencies (UAS=90.3% versus 89.6%), and better for labeled dependencies (LAS=87.6% versus 85.6%).

pdf bib
Learning Recursive Segments for Discourse Parsing
Stergos Afantenos | Pascal Denis | Philippe Muller | Laurence Danlos
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Automatically detecting discourse segments is an important preliminary step towards full discourse parsing. Previous research on discourse segmentation have relied on the assumption that elementary discourse units (EDUs) in a document always form a linear sequence (i.e., they can never be nested). Unfortunately, this assumption turns out to be too strong, for some theories of discourse, like the ""Segmented Discourse Representation Theory"" or SDRT, allow for nested discourse units. In this paper, we present a simple approach to discourse segmentation that is able to produce nested EDUs. Our approach builds on standard multi-class classification techniques making use of a regularized maximum entropy model, combined with a simple repairing heuristic that enforces global coherence. Our system was developed and evaluated on the first round of annotations provided by the French Annodis project (an ongoing effort to create a discourse bank for French). Cross-validated on only 47 documents (1,445 EDUs), our system achieves encouraging performance results with an F-score of 73% for finding EDUs.

pdf bib
Exploitation d’une ressource lexicale pour la construction d’un étiqueteur morpho-syntaxique état-de-l’art du français
Pascal Denis | Benoît Sagot
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente MEltfr, un étiqueteur morpho-syntaxique automatique du français. Il repose sur un modèle probabiliste séquentiel qui bénéficie d’informations issues d’un lexique exogène, à savoir le Lefff. Evalué sur le FTB, MEltfr atteint un taux de précision de 97.75% (91.36% sur les mots inconnus) sur un jeu de 29 étiquettes. Ceci correspond à une diminution du taux d’erreur de 18% (36.1% sur les mots inconnus) par rapport au même modèle sans couplage avec le Lefff. Nous étudions plus en détail la contribution de cette ressource, au travers de deux séries d’expériences. Celles-ci font apparaître en particulier que la contribution des traits issus du Lefff est de permettre une meilleure couverture, ainsi qu’une modélisation plus fine du contexte droit des mots.

2009

pdf bib
Analyse syntaxique du français : des constituants aux dépendances
Marie Candito | Benoît Crabbé | Pascal Denis | François Guérin
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente une technique d’analyse syntaxique statistique à la fois en constituants et en dépendances. L’analyse procède en ajoutant des étiquettes fonctionnelles aux sorties d’un analyseur en constituants, entraîné sur le French Treebank, pour permettre l’extraction de dépendances typées. D’une part, nous spécifions d’un point de vue formel et linguistique les structures de dépendances à produire, ainsi que la procédure de conversion du corpus en constituants (le French Treebank) vers un corpus cible annoté en dépendances, et partiellement validé. D’autre part, nous décrivons l’approche algorithmique qui permet de réaliser automatiquement le typage des dépendances. En particulier, nous nous focalisons sur les méthodes d’apprentissage discriminantes d’étiquetage en fonctions grammaticales.

pdf bib
Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort
Pascal Denis | Benoît Sagot
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

2008

pdf bib
Specialized Models and Ranking for Coreference Resolution
Pascal Denis | Jason Baldridge
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming
Pascal Denis | Jason Baldridge
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference