James Henderson

Also published as: James B. Henderson


2021

pdf bib
Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement
Alireza Mohammadshahi | James Henderson
Transactions of the Association for Computational Linguistics, Volume 9

We propose the Recursive Non-autoregressive Graph-to-Graph Transformer architecture (RNGTr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. We demonstrate the power and effectiveness of RNGTr on several dependency corpora, using a refinement model pre-trained with BERT. We also introduce Syntactic Transformer (SynTr), a non-recursive parser similar to our refinement model. RNGTr can improve the accuracy of a variety of initial parsers on 13 languages from the Universal Dependencies Treebanks, English and Chinese Penn Treebanks, and the German CoNLL2009 corpus, even improving over the new state-of-the-art results achieved by SynTr, significantly improving the state-of-the-art for all corpora tested.

pdf bib
The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task
James Barry | Alireza Mohammadshahi | Joachim Wagner | Jennifer Foster | James Henderson
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

We describe the DCU-EPFL submission to the IWPT 2021 Parsing Shared Task: From Raw Text to Enhanced Universal Dependencies. The task involves parsing Enhanced UD graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. Evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from each language starting from raw strings. Our approach uses the Stanza pipeline to preprocess the text files, XLM-RoBERTa to obtain contextualized token representations, and an edge-scoring and labeling model to predict the enhanced graph. Finally, we run a postprocessing script to ensure all of our outputs are valid Enhanced UD graphs. Our system places 6th out of 9 participants with a coarse Enhanced Labeled Attachment Score (ELAS) of 83.57. We carry out additional post-deadline experiments which include using Trankit for pre-processing, XLM-RoBERTa LARGE, treebank concatenation, and multitask learning between a basic and an enhanced dependency parser. All of these modifications improve our initial score and our final system has a coarse ELAS of 88.04.

pdf bib
Multi-Adversarial Learning for Cross-Lingual Word Embeddings
Haozhou Wang | James Henderson | Paola Merlo
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Generative adversarial networks (GANs) have succeeded in inducing cross-lingual word embeddings - maps of matching words across languages - without supervision. Despite these successes, GANs’ performance for the difficult case of distant languages is still not satisfactory. These limitations have been explained by GANs’ incorrect assumption that source and target embedding spaces are related by a single linear mapping and are approximately isomorphic. We assume instead that, especially across distant languages, the mapping is only piece-wise linear, and propose a multi-adversarial learning method. This novel method induces the seed cross-lingual dictionary through multiple mappings, each induced to fit the mapping for one subspace. Our experiments on unsupervised bilingual lexicon induction and cross-lingual document classification show that this method improves performance over previous single-mapping methods, especially for distant languages.

pdf bib
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks
Rabeeh Karimi Mahabadi | Sebastian Ruder | Mostafa Dehghani | James Henderson
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing information across tasks. In this paper, we show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model. This parameter-efficient multi-task learning framework allows us to achieve the best of both worlds by sharing knowledge across tasks via hypernetworks while enabling the model to adapt to each individual task through task-specific adapters. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task. We additionally demonstrate substantial performance improvements in few-shot domain generalization across a variety of tasks. Our code is publicly available in https://github.com/rabeehk/hyperformer.

2020

pdf bib
Partially-supervised Mention Detection
Lesly Miculicich | James Henderson
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

Learning to detect entity mentions without using syntactic information can be useful for integration and joint optimization with other tasks. However, it is common to have partially annotated data for this problem. Here, we investigate two approaches to deal with partial annotation of mentions: weighted loss and soft-target classification. We also propose two neural mention detection approaches: a sequence tagging, and an exhaustive search. We evaluate our methods with coreference resolution as a downstream task, using multitask learning. The results show that the recall and F1 score improve for all methods.

pdf bib
Graph-to-Graph Transformer for Transition-based Dependency Parsing
Alireza Mohammadshahi | James Henderson
Findings of the Association for Computational Linguistics: EMNLP 2020

We propose the Graph2Graph Transformer architecture for conditioning on and predicting arbitrary graphs, and apply it to the challenging task of transition-based dependency parsing. After proposing two novel Transformer models of transition-based dependency parsing as strong baselines, we show that adding the proposed mechanisms for conditioning on and predicting graphs of Graph2Graph Transformer results in significant improvements, both with and without BERT pre-training. The novel baselines and their integration with Graph2Graph Transformer significantly outperform the state-of-the-art in traditional transition-based dependency parsing on both English Penn Treebank, and 13 languages of Universal Dependencies Treebanks. Graph2Graph Transformer can be integrated with many previous structured prediction methods, making it easy to apply to a wide range of NLP tasks.

pdf bib
The Unstoppable Rise of Computational Linguistics in Deep Learning
James Henderson
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we trace the history of neural networks applied to natural language understanding tasks, and identify key contributions which the nature of language has made to the development of neural network architectures. We focus on the importance of variable binding and its instantiation in attention-based models, and argue that Transformer is not a sequence model but an induced-structure model. This perspective leads to predictions of the challenges facing research in deep learning architectures for natural language understanding.

pdf bib
End-to-End Bias Mitigation by Modelling Biases in Corpora
Rabeeh Karimi Mahabadi | Yonatan Belinkov | James Henderson
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Several recent studies have shown that strong natural language understanding (NLU) models are prone to relying on unwanted dataset biases without learning the underlying task, resulting in models that fail to generalize to out-of-domain datasets and are likely to perform poorly in real-world scenarios. We propose two learning strategies to train neural models, which are more robust to such biases and transfer better to out-of-domain datasets. The biases are specified in terms of one or more bias-only models, which learn to leverage the dataset biases. During training, the bias-only models’ predictions are used to adjust the loss of the base model to reduce its reliance on biases by down-weighting the biased examples and focusing the training on the hard examples. We experiment on large-scale natural language inference and fact verification benchmarks, evaluating on out-of-domain datasets that are specifically designed to assess the robustness of models against known biases in the training data. Results show that our debiasing methods greatly improve robustness in all settings and better transfer to other textual entailment datasets. Our code and data are publicly available in https://github.com/rabeehk/robust-nli.

pdf bib
Plug and Play Autoencoders for Conditional Text Generation
Florian Mai | Nikolaos Pappas | Ivan Montero | Noah A. Smith | James Henderson
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Text autoencoders are commonly used for conditional generation tasks such as style transfer. We propose methods which are plug and play, where any pretrained autoencoder can be used, and only require learning a mapping within the autoencoder’s embedding space, training embedding-to-embedding (Emb2Emb). This reduces the need for labeled training data for the task and makes the training procedure more efficient. Crucial to the success of this method is a loss term for keeping the mapped embedding on the manifold of the autoencoder and a mapping which is trained to navigate the manifold by learning offset vectors. Evaluations on style transfer tasks both with and without sequence-to-sequence supervision show that our method performs better than or comparable to strong baselines while being up to four times faster.

2019

pdf bib
Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings
Haozhou Wang | James Henderson | Paola Merlo
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Distributed representations of words which map each word to a continuous vector have proven useful in capturing important linguistic information not only in a single language but also across different languages. Current unsupervised adversarial approaches show that it is possible to build a mapping matrix that aligns two sets of monolingual word embeddings without high quality parallel data, such as a dictionary or a sentence-aligned corpus. However, without an additional step of refinement, the preliminary mapping learnt by these methods is unsatisfactory, leading to poor performance for typologically distant languages. In this paper, we propose a weakly-supervised adversarial training method to overcome this limitation, based on the intuition that mapping across languages is better done at the concept level than at the word level. We propose a concept-based adversarial training method which improves the performance of previous unsupervised adversarial methods for most languages, and especially for typologically distant language pairs.

pdf bib
GILE: A Generalized Input-Label Embedding for Text Classification
Nikolaos Pappas | James Henderson
Transactions of the Association for Computational Linguistics, Volume 7

Neural text classification models typically treat output labels as categorical variables that lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model that generalizes over previous such models, addresses their limitations, and does not compromise performance on seen labels. The model consists of a joint nonlinear input-label embedding with controllable capacity and a joint-space-dependent classification unit that is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models that do not leverage label semantics and previous joint input-label space models in both scenarios.

2018

pdf bib
Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation
Xiao Pu | Nikolaos Pappas | James Henderson | Andrei Popescu-Belis
Transactions of the Association for Computational Linguistics, Volume 6

This paper demonstrates that word sense disambiguation (WSD) can improve neural machine translation (NMT) by widening the source context considered when modeling the senses of potentially ambiguous words. We first introduce three adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant processes, and random walks, which are then applied to large word contexts represented in a low-rank space and evaluated on SemEval shared-task data. We then learn word vectors jointly with sense vectors defined by our best WSD method, within a state-of-the-art NMT system. We show that the concatenation of these vectors, and the use of a sense selection mechanism based on the weighted average of sense vectors, outperforms several baselines including sense-aware ones. This is demonstrated by translation on five language pairs. The improvements are more than 1 BLEU point over strong NMT baselines, +4% accuracy over all ambiguous nouns and verbs, or +20% when scored manually over several challenging words.

pdf bib
Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation
Nikolaos Pappas | Lesly Miculicich | James Henderson
Proceedings of the Third Conference on Machine Translation: Research Papers

Tying the weights of the target word embeddings with the target word classifiers of neural machine translation models leads to faster training and often to better translation quality. Given the success of this parameter sharing, we investigate other forms of sharing in between no sharing and hard equality of parameters. In particular, we propose a structure-aware output layer which captures the semantic structure of the output space of words within a joint input-output embedding. The model is a generalized form of weight tying which shares parameters but allows learning a more flexible relationship with input word embeddings and allows the effective capacity of the output layer to be controlled. In addition, the model shares weights across output classifiers and translation contexts which allows it to better leverage prior knowledge about them. Our evaluation on English-to-Finnish and English-to-German datasets shows the effectiveness of the method against strong encoder-decoder baselines trained with or without weight tying.

pdf bib
Document-Level Neural Machine Translation with Hierarchical Attention Networks
Lesly Miculicich | Dhananjay Ram | Nikolaos Pappas | James Henderson
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Neural Machine Translation (NMT) can be improved by including document-level contextual information. For this purpose, we propose a hierarchical attention model to capture the context in a structured and dynamic manner. The model is integrated in the original NMT architecture as another level of abstraction, conditioning on the NMT model’s own previous hidden states. Experiments show that hierarchical attention significantly improves the BLEU score over a strong NMT baseline with the state-of-the-art in context-aware methods, and that both the encoder and decoder benefit from context in complementary ways.

2017

pdf bib
CLCL (Geneva) DINN Parser: a Neural Network Dependency Parser Ten Years Later
Christophe Moor | Paola Merlo | James Henderson | Haozhou Wang
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This paper describes the University of Geneva’s submission to the CoNLL 2017 shared task Multilingual Parsing from Raw Text to Universal Dependencies (listed as the CLCL (Geneva) entry). Our submitted parsing system is the grandchild of the first transition-based neural network dependency parser, which was the University of Geneva’s entry in the CoNLL 2007 multilingual dependency parsing shared task, with some improvements to speed and portability. These results provide a baseline for investigating how far we have come in the past ten years of work on neural network dependency parsing.

2016

pdf bib
A Vector Space for Distributional Semantics for Entailment
James Henderson | Diana Popa
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Modeling the Non-Substitutability of Multiword Expressions with Distributional Semantics and a Log-Linear Model
Meghdad Farahmand | James Henderson
Proceedings of the 12th Workshop on Multiword Expressions

2015

pdf bib
A Model of Zero-Shot Learning of Spoken Language Understanding
Majid Yazdani | James Henderson
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Named entity recognition with document-specific KB tag gazetteers
Will Radford | Xavier Carreras | James Henderson
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Semantic Composition to Detect Non-compositionality of Multiword Expressions
Majid Yazdani | Meghdad Farahmand | James Henderson
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Incremental Recurrent Neural Network Dependency Parser with Search-based Discriminative Training
Majid Yazdani | James Henderson
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

2014

pdf bib
Undirected Machine Translation with Discriminative Reinforcement Learning
Andrea Gesmundo | James Henderson
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
The PARLANCE mobile application for interactive search in English and Mandarin
Helen Hastie | Marie-Aude Aufaure | Panos Alexopoulos | Hugues Bouchard | Catherine Breslin | Heriberto Cuayáhuitl | Nina Dethlefs | Milica Gašić | James Henderson | Oliver Lemon | Xingkun Liu | Peter Mika | Nesrine Ben Mustapha | Tim Potter | Verena Rieser | Blaise Thomson | Pirros Tsiakoulis | Yves Vanrompay | Boris Villazon-Terrazas | Majid Yazdani | Steve Young | Yanchao Yu
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2013

pdf bib
Graph-Based Seed Set Expansion for Relation Extraction Using Random Walk Hitting Times
Joel Lang | James Henderson
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Demonstration of the PARLANCE system: a data-driven incremental, spoken dialogue system for interactive search
Helen Hastie | Marie-Aude Aufaure | Panos Alexopoulos | Heriberto Cuayáhuitl | Nina Dethlefs | Milica Gasic | James Henderson | Oliver Lemon | Xingkun Liu | Peter Mika | Nesrine Ben Mustapha | Verena Rieser | Blaise Thomson | Pirros Tsiakoulis | Yves Vanrompay
Proceedings of the SIGDIAL 2013 Conference

pdf bib
Multilingual Joint Parsing of Syntactic and Semantic Dependencies with a Latent Variable Model
James Henderson | Paola Merlo | Ivan Titov | Gabriele Musillo
Computational Linguistics, Volume 39, Issue 4 - December 2013

2012

pdf bib
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Jun’ichi Tsujii | James Henderson | Marius Paşca
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Unsupervised Semantic Role Induction with Global Role Ordering
Nikhil Garg | James Henderson
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Heuristic Cube Pruning in Linear Time
Andrea Gesmundo | Giorgio Satta | James Henderson
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
Bayesian Network Automata for Modelling Unbounded Structures
James Henderson
Proceedings of the 12th International Conference on Parsing Technologies

pdf bib
Heuristic Search for Non-Bottom-Up Tree Structure Prediction
Andrea Gesmundo | James Henderson
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Temporal Restricted Boltzmann Machines for Dependency Parsing
Nikhil Garg | James Henderson
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Scaling up Automatic Cross-Lingual Semantic Role Annotation
Lonneke van der Plas | Paola Merlo | James Henderson
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Faster cube pruning
Andrea Gesmundo | James Henderson
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

2009

pdf bib
Domain Adaptation with Artificial Data for Semantic Parsing of Speech
Lonneke van der Plas | James Henderson | Paola Merlo
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
A Latent Variable Model of Synchronous Syntactic-Semantic Parsing for Multiple Languages
Andrea Gesmundo | James Henderson | Paola Merlo | Ivan Titov
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

2008

pdf bib
A Latent Variable Model of Synchronous Parsing for Syntactic and Semantic Dependencies
James Henderson | Paola Merlo | Gabriele Musillo | Ivan Titov
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib
Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets
James Henderson | Oliver Lemon | Kallirroi Georgila
Computational Linguistics, Volume 34, Number 4, December 2008

pdf bib
Mixture Model POMDPs for Efficient Handling of Uncertainty in Dialogue Management
James Henderson | Oliver Lemon
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
A Latent Variable Model for Generative Dependency Parsing
Ivan Titov | James Henderson
Proceedings of the Tenth International Conference on Parsing Technologies

pdf bib
Fast and Robust Multilingual Dependency Parsing with a Generative Latent Variable Model
Ivan Titov | James Henderson
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Constituent Parsing with Incremental Sigmoid Belief Networks
Ivan Titov | James Henderson
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Loss Minimization in Parse Reranking
Ivan Titov | James Henderson
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Porting Statistical Parsers with Data-Defined Kernels
Ivan Titov | James Henderson
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf bib
An ISU Dialogue System Exhibiting Reinforcement Learning of Dialogue Policies: Generic Slot-Filling in the TALK In-car System
Oliver Lemon | Kallirroi Georgila | James Henderson | Matthew Stuttle
Demonstrations

2005

pdf bib
Data-Defined Kernels for Parse Reranking Derived from Probabilistic Models
James Henderson | Ivan Titov
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Lookahead in Deterministic Left-Corner Parsing
James Henderson
Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together

pdf bib
Discriminative Training of a Neural Network Statistical Parser
James Henderson
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Inducing History Representations for Broad Coverage Statistical Parsing
James Henderson
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Generative versus Discriminative Models for Statistical Left-Corner Parsing
James Henderson
Proceedings of the Eighth International Conference on Parsing Technologies

We propose two statistical left-corner parsers and investigate their accuracy at varying speeds. The parser based on a generative probability model achieves state-of-the-art accuracy when sufficient time is available, but when high speed is required the parser based on a discriminative probability model performs better. Neural network probability estimation is used to handle conditioning on both the unbounded parse histories and the unbounded lookahead strings.

pdf bib
Neural Network Probability Estimation for Broad Coverage Parsing
James Henderson
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
Using Syntactic Analysis to Increase Efficiency in Visualizing Text Collections
James Henderson | Paola Merlo | Ivan Petroff | Gerold Schneider
COLING 2002: The 19th International Conference on Computational Linguistics

2000

pdf bib
A Neural Network Parser that Handles Sparse Data
James Henderson
Proceedings of the Sixth International Workshop on Parsing Technologies

Previous work has demonstrated the viability of a particular neural network architecture, Simple Synchrony Networks, for syntactic parsing. Here we present additional results on the performance of this type of parser, including direct comparisons on the same dataset with a standard statistical parsing method, Probabilistic Context Free Grammars. We focus these experiments on demonstrating one of the main advantages of the SSN parser over the PCFG, handling sparse data. We use smaller datasets than are typically used with statistical methods, resulting in the PCFG finding parses for under half of the test sentences, while the SSN finds parses for all sentences. Even on the PCFG ‘s parsed half, the SSN performs better than the PCFG, as measure by recall and precision on both constituents and a dependency-like measure.

1998

pdf bib
A Connectionist Architecture for Learning to Parse
James Henderson | Peter Lane
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
A Connectionist Architecture for Learning to Parse
James Henderson | Peter Lane
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

1993

pdf bib
Connectionist Approaches to Natural Language Processing
James Henderson
Computational Linguistics, Volume 19, Number 3, September 1993

1992

pdf bib
A Connectionist Parser for Structure Unification Grammar
James B. Henderson
30th Annual Meeting of the Association for Computational Linguistics

1991

pdf bib
An Incremental Connectionist Phrase Structure Parser
James Henderson
29th Annual Meeting of the Association for Computational Linguistics