Henry Elder


2020

pdf bib
ADAPT at SR’20: How Preprocessing and Data Augmentation Help to Improve Surface Realization
Henry Elder
Proceedings of the Third Workshop on Multilingual Surface Realisation

In this paper, we describe the ADAPT submission to the Surface Realization Shared Task 2020. We present a neural-based system trained on the English Web Treebank and an augmented dataset, automatically created from existing text corpora.

pdf bib
How to Make Neural Natural Language Generation as Reliable as Templates in Task-Oriented Dialogue
Henry Elder | Alexander O’Connor | Jennifer Foster
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Neural Natural Language Generation (NLG) systems are well known for their unreliability. To overcome this issue, we propose a data augmentation approach which allows us to restrict the output of a network and guarantee reliability. While this restriction means generation will be less diverse than if randomly sampled, we include experiments that demonstrate the tendency of existing neural generation approaches to produce dull and repetitive text, and we argue that reliability is more important than diversity for this task. The system trained using this approach scored 100% in semantic accuracy on the E2E NLG Challenge dataset, the same as a template system.

pdf bib
Shape of Synth to Come: Why We Should Use Synthetic Data for English Surface Realization
Henry Elder | Robert Burke | Alexander O’Connor | Jennifer Foster
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The Surface Realization Shared Tasks of 2018 and 2019 were Natural Language Generation shared tasks with the goal of exploring approaches to surface realization from Universal-Dependency-like trees to surface strings for several languages. In the 2018 shared task there was very little difference in the absolute performance of systems trained with and without additional, synthetically created data, and a new rule prohibiting the use of synthetic data was introduced for the 2019 shared task. Contrary to the findings of the 2018 shared task, we show, in experiments on the English 2018 dataset, that the use of synthetic data can have a substantial positive effect – an improvement of almost 8 BLEU points for a previously state-of-the-art system. We analyse the effects of synthetic data, and we argue that its use should be encouraged rather than prohibited so that future research efforts continue to explore systems that can take advantage of such data.

2019

pdf bib
Designing a Symbolic Intermediate Representation for Neural Surface Realization
Henry Elder | Jennifer Foster | James Barry | Alexander O’Connor
Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

Generated output from neural NLG systems often contain errors such as hallucination, repetition or contradiction. This work focuses on designing a symbolic intermediate representation to be used in multi-stage neural generation with the intention of reducing the frequency of failed outputs. We show that surface realization from this intermediate representation is of high quality and when the full system is applied to the E2E dataset it outperforms the winner of the E2E challenge. Furthermore, by breaking out the surface realization step from typically end-to-end neural systems, we also provide a framework for non-neural based content selection and planning systems to potentially take advantage of semi-supervised pretraining of neural surface realization models.

2018

pdf bib
Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models
Henry Elder | Chris Hokamp
Proceedings of the First Workshop on Multilingual Surface Realisation

This work presents state of the art results in reconstruction of surface realizations from obfuscated text. We identify the lack of sufficient training data as the major obstacle to training high-performing models, and solve this issue by generating large amounts of synthetic training data. We also propose preprocessing techniques which make the structure contained in the input features more accessible to sequence models. Our models were ranked first on all evaluation metrics in the English portion of the 2018 Surface Realization shared task.

pdf bib
End-to-End Content and Plan Selection for Data-to-Text Generation
Sebastian Gehrmann | Falcon Dai | Henry Elder | Alexander Rush
Proceedings of the 11th International Conference on Natural Language Generation

Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG. This problem can be challenging when the form of the structured data varies between examples. This paper presents a survey of several extensions to sequence-to-sequence models to account for the latent content selection process, particularly variants of copy attention and coverage decoding. We further propose a training method based on diverse ensembling to encourage models to learn distinct sentence templates during training. An empirical evaluation of these techniques shows an increase in the quality of generated text across five automated metrics, as well as human evaluation.

pdf bib
E2E NLG Challenge Submission: Towards Controllable Generation of Diverse Natural Language
Henry Elder | Sebastian Gehrmann | Alexander O’Connor | Qun Liu
Proceedings of the 11th International Conference on Natural Language Generation

In natural language generation (NLG), the task is to generate utterances from a more abstract input, such as structured data. An added challenge is to generate utterances that contain an accurate representation of the input, while reflecting the fluency and variety of human-generated text. In this paper, we report experiments with NLG models that can be used in task oriented dialogue systems. We explore the use of additional input to the model to encourage diversity and control of outputs. While our submission does not rank highly using automated metrics, qualitative investigation of generated utterances suggests the use of additional information in neural network NLG systems to be a promising research direction.