Ehud Reiter


2021

pdf bib
A Systematic Review of Reproducibility Research in Natural Language Processing
Anya Belz | Shubham Agarwal | Anastasia Shimorina | Ehud Reiter
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Against the background of what has been termed a reproducibility crisis in science, the NLP field is becoming increasingly interested in, and conscientious about, the reproducibility of its results. The past few years have seen an impressive range of new initiatives, events and active research in the area. However, the field is far from reaching a consensus about how reproducibility should be defined, measured and addressed, with diversity of views currently increasing rather than converging. With this focused contribution, we aim to provide a wide-angle, and as near as possible complete, snapshot of current work on reproducibility in NLP,

pdf bib
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)
Anya Belz | Shubham Agarwal | Yvette Graham | Ehud Reiter | Anastasia Shimorina
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

pdf bib
Towards Objectively Evaluating the Quality of Generated Medical Summaries
Francesco Moramarco | Damir Juric | Aleksandar Savkov | Ehud Reiter
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

We propose a method for evaluating the quality of generated text by asking evaluators to count facts, and computing precision, recall, f-score, and accuracy from the raw counts. We believe this approach leads to a more objective and easier to reproduce evaluation. We apply this to the task of medical report summarisation, where measuring objective quality and accuracy is of paramount importance.

pdf bib
A Preliminary Study on Evaluating Consultation Notes With Post-Editing
Francesco Moramarco | Alex Papadopoulos Korfiatis | Aleksandar Savkov | Ehud Reiter
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

Automatic summarisation has the potential to aid physicians in streamlining clerical tasks such as note taking. But it is notoriously difficult to evaluate these systems and demonstrate that they are safe to be used in a clinical setting. To circumvent this issue, we propose a semi-automatic approach whereby physicians post-edit generated notes before submitting them. We conduct a preliminary study on the time saving of automatically generated consultation notes with post-editing. Our evaluators are asked to listen to mock consultations and to post-edit three generated notes. We time this and find that it is faster than writing the note from scratch. We present insights and lessons learnt from this experiment.

pdf bib
Proceedings of the 14th International Conference on Natural Language Generation
Anya Belz | Angela Fan | Ehud Reiter | Yaji Sripada
Proceedings of the 14th International Conference on Natural Language Generation

pdf bib
Explaining Decision-Tree Predictions by Addressing Potential Conflicts between Predictions and Plausible Expectations
Sameen Maruf | Ingrid Zukerman | Ehud Reiter | Gholamreza Haffari
Proceedings of the 14th International Conference on Natural Language Generation

We offer an approach to explain Decision Tree (DT) predictions by addressing potential conflicts between aspects of these predictions and plausible expectations licensed by background information. We define four types of conflicts, operationalize their identification, and specify explanatory schemas that address them. Our human evaluation focused on the effect of explanations on users’ understanding of a DT’s reasoning and their willingness to act on its predictions. The results show that (1) explanations that address potential conflicts are considered at least as good as baseline explanations that just follow a DT path; and (2) the conflict-based explanations are deemed especially valuable when users’ expectations disagree with the DT’s predictions.

pdf bib
Generation Challenges: Results of the Accuracy Evaluation Shared Task
Craig Thomson | Ehud Reiter
Proceedings of the 14th International Conference on Natural Language Generation

The Shared Task on Evaluating Accuracy focused on techniques (both manual and automatic) for evaluating the factual accuracy of texts produced by neural NLG systems, in a sports-reporting domain. Four teams submitted evaluation techniques for this task, using very different approaches and techniques. The best-performing submissions did encouragingly well at this difficult task. However, all automatic submissions struggled to detect factual errors which are semantically or pragmatically complex (for example, based on incorrect computation or inference).

pdf bib
The ReproGen Shared Task on Reproducibility of Human Evaluations in NLG: Overview and Results
Anya Belz | Anastasia Shimorina | Shubham Agarwal | Ehud Reiter
Proceedings of the 14th International Conference on Natural Language Generation

The NLP field has recently seen a substantial increase in work related to reproducibility of results, and more generally in recognition of the importance of having shared definitions and practices relating to evaluation. Much of the work on reproducibility has so far focused on metric scores, with reproducibility of human evaluation results receiving far less attention. As part of a research programme designed to develop theory and practice of reproducibility assessment in NLP, we organised the first shared task on reproducibility of human evaluations, ReproGen 2021. This paper describes the shared task in detail, summarises results from each of the reproduction studies submitted, and provides further comparative analysis of the results. Out of nine initial team registrations, we received submissions from four teams. Meta-analysis of the four reproduction studies revealed varying degrees of reproducibility, and allowed very tentative first conclusions about what types of evaluation tend to have better reproducibility.

2020

pdf bib
Arabic NLG Language Functions
Wael Abed | Ehud Reiter
Proceedings of the 13th International Conference on Natural Language Generation

The Arabic language has very limited supports from NLG researchers. In this paper, we explain the challenges of the core grammar, provide a lexical resource, and implement the first language functions for the Arabic language. We did a human evaluation to evaluate our functions in generating sentences from the NADA Corpus.

pdf bib
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Craig Thomson | Ehud Reiter
Proceedings of the 13th International Conference on Natural Language Generation

Most Natural Language Generation systems need to produce accurate texts. We propose a methodology for high-quality human evaluation of the accuracy of generated texts, which is intended to serve as a gold-standard for accuracy evaluations of data-to-text systems. We use our methodology to evaluate the accuracy of computer generated basketball summaries. We then show how our gold standard evaluation can be used to validate automated metrics.

pdf bib
Shared Task on Evaluating Accuracy
Ehud Reiter | Craig Thomson
Proceedings of the 13th International Conference on Natural Language Generation

We propose a shared task on methodologies and algorithms for evaluating the accuracy of generated texts, specifically summaries of basketball games produced from basketball box score and other game data. We welcome submissions based on protocols for human evaluation, automatic metrics, as well as combinations of human evaluations and metrics.

pdf bib
ReproGen: Proposal for a Shared Task on Reproducibility of Human Evaluations in NLG
Anya Belz | Shubham Agarwal | Anastasia Shimorina | Ehud Reiter
Proceedings of the 13th International Conference on Natural Language Generation

Across NLP, a growing body of work is looking at the issue of reproducibility. However, replicability of human evaluation experiments and reproducibility of their results is currently under-addressed, and this is of particular concern for NLG where human evaluations are the norm. This paper outlines our ideas for a shared task on reproducibility of human evaluations in NLG which aims (i) to shed light on the extent to which past NLG evaluations are replicable and reproducible, and (ii) to draw conclusions regarding how evaluations can be designed and reported to increase replicability and reproducibility. If the task is run over several years, we hope to be able to document an overall increase in levels of replicability and reproducibility over time.

pdf bib
SportSett:Basketball - A robust and maintainable data-set for Natural Language Generation
Craig Thomson | Ehud Reiter | Somayajulu Sripada
Proceedings of the Workshop on Intelligent Information Processing and Natural Language Generation

pdf bib
Iterative Neural Scoring of Validated Insight Candidates
Allmin Susaiyah | Aki Härmä | Ehud Reiter | Milan Petković
Proceedings of the Workshop on Intelligent Information Processing and Natural Language Generation

pdf bib
How are you? Introducing stress-based text tailoring
Simone Balloccu | Ehud Reiter | Alexandra Johnstone | Claire Fyfe
Proceedings of the Workshop on Intelligent Information Processing and Natural Language Generation

pdf bib
Explaining Bayesian Networks in Natural Language: State of the Art and Challenges
Conor Hennessy | Alberto Bugarín | Ehud Reiter
2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence

In order to increase trust in the usage of Bayesian Networks and to cement their role as a model which can aid in critical decision making, the challenge of explainability must be faced. Previous attempts at explaining Bayesian Networks have largely focused on graphical or visual aids. In this paper we aim to highlight the importance of a natural language approach to explanation and to discuss some of the previous and state of the art attempts of the textual explanation of Bayesian Networks. We outline several challenges that remain to be addressed in the generation and validation of natural language explanations of Bayesian Networks. This can serve as a reference for future work on natural language explanations of Bayesian Networks.

2019

pdf bib
Natural Language Generation Challenges for Explainable AI
Ehud Reiter
Proceedings of the 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI 2019)

2018

pdf bib
A Structured Review of the Validity of BLEU
Ehud Reiter
Computational Linguistics, Volume 44, Issue 3 - September 2018

The BLEU metric has been widely used in NLP for over 15 years to evaluate NLP systems, especially in machine translation and natural language generation. I present a structured review of the evidence on whether BLEU is a valid evaluation technique—in other words, whether BLEU scores correlate with real-world utility and user-satisfaction of NLP systems; this review covers 284 correlations reported in 34 papers. Overall, the evidence supports using BLEU for diagnostic evaluation of MT systems (which is what it was originally proposed for), but does not support using BLEU outside of MT, for evaluation of individual texts, or for scientific hypothesis testing.

pdf bib
Comprehension Driven Document Planning in Natural Language Generation Systems
Craig Thomson | Ehud Reiter | Somayajulu Sripada
Proceedings of the 11th International Conference on Natural Language Generation

This paper proposes an approach to NLG system design which focuses on generating output text which can be more easily processed by the reader. Ways in which cognitive theory might be combined with existing NLG techniques are discussed and two simple experiments in content ordering are presented.

pdf bib
Generating Summaries of Sets of Consumer Products: Learning from Experiments
Kittipitch Kuptavanich | Ehud Reiter | Kees Van Deemter | Advaith Siddharthan
Proceedings of the 11th International Conference on Natural Language Generation

We explored the task of creating a textual summary describing a large set of objects characterised by a small number of features using an e-commerce dataset. When a set of consumer products is large and varied, it can be difficult for a consumer to understand how the products in the set differ; consequently, it can be challenging to choose the most suitable product from the set. To assist consumers, we generated high-level summaries of product sets. Two generation algorithms are presented, discussed, and evaluated with human users. Our evaluation results suggest a positive contribution to consumers’ understanding of the domain.

pdf bib
Meteorologists and Students: A resource for language grounding of geographical descriptors
Alejandro Ramos-Soto | Ehud Reiter | Kees van Deemter | Jose Alonso | Albert Gatt
Proceedings of the 11th International Conference on Natural Language Generation

We present a data resource which can be useful for research purposes on language grounding tasks in the context of geographical referring expression generation. The resource is composed of two data sets that encompass 25 different geographical descriptors and a set of associated graphical representations, drawn as polygons on a map by two groups of human subjects: teenage students and expert meteorologists.

2017

pdf bib
Proceedings of the 10th International Conference on Natural Language Generation
Jose M. Alonso | Alberto Bugarín | Ehud Reiter
Proceedings of the 10th International Conference on Natural Language Generation

pdf bib
A Commercial Perspective on Reference
Ehud Reiter
Proceedings of the 10th International Conference on Natural Language Generation

I briefly describe some of the commercial work which XXX is doing in referring expression algorithms, and highlight differences between what is commercially important (at least to XXX) and the NLG research literature. In particular, XXX is less interested in generic reference algorithms than in high-quality algorithms for specific types of references, such as components of machines, named entities, and dates.

pdf bib
Textually Summarising Incomplete Data
Stephanie Inglis | Ehud Reiter | Somayajulu Sripada
Proceedings of the 10th International Conference on Natural Language Generation

Many data-to-text NLG systems work with data sets which are incomplete, ie some of the data is missing. We have worked with data journalists to understand how they describe incomplete data, and are building NLG algorithms based on these insights. A pilot evaluation showed mixed results, and highlighted several areas where we need to improve our system.

2016

pdf bib
Absolute and Relative Properties in Geographic Referring Expressions
Rodrigo de Oliveira | Somayajulu Sripada | Ehud Reiter
Proceedings of the 9th International Natural Language Generation conference

2015

pdf bib
Designing an Algorithm for Generating Named Spatial References
Rodrigo de Oliveira | Yaji Sripada | Ehud Reiter
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

pdf bib
Creating Textual Driver Feedback from Telemetric Data
Daniel Braun | Ehud Reiter | Advaith Siddharthan
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

2014

pdf bib
Generating Annotated Graphs using the NLG Pipeline Architecture
Saad Mahamood | William Bradshaw | Ehud Reiter
Proceedings of the 8th International Natural Language Generation Conference (INLG)

2013

pdf bib
Generating Expressions that Refer to Visible Objects
Margaret Mitchell | Kees van Deemter | Ehud Reiter
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
MIME - NLG in Pre-Hospital Care
Anne Schneider | Alasdair Mort | Chris Mellish | Ehud Reiter | Phil Wilson | Pierre-Luc Vaudry
Proceedings of the 14th European Workshop on Natural Language Generation

pdf bib
MIME- NLG Support for Complex and Unstable Pre-hospital Emergencies
Anne Schneider | Alasdair Mort | Chris Mellish | Ehud Reiter | Phil Wilson | Pierre-Luc Vaudry
Proceedings of the 14th European Workshop on Natural Language Generation

2012

pdf bib
Working with Clinicians to Improve a Patient-Information NLG System
Saad Mahamood | Ehud Reiter
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

2011

pdf bib
Task-Based Evaluation of NLG Systems: Control vs Real-World Context
Ehud Reiter
Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop

pdf bib
Generating Affective Natural Language for Parents of Neonatal Infants
Saad Mahamood | Ehud Reiter
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
What is in a text and what does it do: Qualitative Evaluations of an NLG system – the BT-Nurse – using content analysis and discourse analysis
Rahul Sambaraju | Ehud Reiter | Robert Logie | Andy Mckinlay | Chris McVittie | Albert Gatt | Cindy Sykes
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
Two Approaches for Generating Size Modifiers
Margaret Mitchell | Kees van Deemter | Ehud Reiter
Proceedings of the 13th European Workshop on Natural Language Generation

2010

pdf bib
Using NLG and Sensors to Support Personal Narrative for Children with Complex Communication Needs
Rolf Black | Joseph Reddington | Ehud Reiter | Nava Tintarev | Annalu Waller
Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies

pdf bib
Automatic generation of conversational utterances and narrative for Augmentative and Alternative Communication: a prototype system
Martin Dempster | Norman Alm | Ehud Reiter
Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies

pdf bib
Natural Reference to Objects in a Visual Domain
Margaret Mitchell | Kees van Deemter | Ehud Reiter
Proceedings of the 6th International Natural Language Generation Conference

2009

pdf bib
An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems
Ehud Reiter | Anja Belz
Computational Linguistics, Volume 35, Number 4, December 2009

pdf bib
Le projet BabyTalk : génération de texte à partir de données hétérogènes pour la prise de décision en unité néonatale
François Portet | Albert Gatt | Jim Hunter | Ehud Reiter | Somayajulu Sripada
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Notre société génère une masse d’information toujours croissante, que ce soit en médecine, en météorologie, etc. La méthode la plus employée pour analyser ces données est de les résumer sous forme graphique. Cependant, il a été démontré qu’un résumé textuel est aussi un mode de présentation efficace. L’objectif du prototype BT-45, développé dans le cadre du projet Babytalk, est de générer des résumés de 45 minutes de signaux physiologiques continus et d’événements temporels discrets en unité néonatale de soins intensifs (NICU). L’article présente l’aspect génération de texte de ce prototype. Une expérimentation clinique a montré que les résumés humains améliorent la prise de décision par rapport à l’approche graphique, tandis que les textes de BT-45 donnent des résultats similaires à l’approche graphique. Une analyse a identifié certaines des limitations de BT-45 mais en dépit de cellesci, notre travail montre qu’il est possible de produire automatiquement des résumés textuels efficaces de données complexes.

pdf bib
Using NLG to Help Language-Impaired Users Tell Stories and Participate in Social Dialogues
Ehud Reiter | Ross Turner | Norman Alm | Rolf Black | Martin Dempster | Annalu Waller
Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

pdf bib
Generating Approximate Geographic Descriptions
Ross Turner | Yaji Sripada | Ehud Reiter
Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

pdf bib
SimpleNLG: A Realisation Engine for Practical Applications
Albert Gatt | Ehud Reiter
Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

2008

pdf bib
Using Spatial Reference Frames to Generate Grounded Textual Summaries of Georeferenced Data
Ross Turner | Somayajulu Sripada | Ehud Reiter | Ian Davy
Proceedings of the Fifth International Natural Language Generation Conference

pdf bib
The Importance of Narrative and Other Lessons from an Evaluation of an NLG System that Summarises Clinical Data
Ehud Reiter | Albert Gatt | François Portet | Marian van der Meulen
Proceedings of the Fifth International Natural Language Generation Conference

2007

pdf bib
An Architecture for Data-to-Text Systems
Ehud Reiter
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)

pdf bib
A Comparison of Hedged and Non-hedged NLG Texts
Saad Mahamood | Ehud Reiter | Chris Mellish
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)

pdf bib
Last Words: The Shrinking Horizons of Computational Linguistics
Ehud Reiter
Computational Linguistics, Volume 33, Number 2, June 2007

pdf bib
The attribute selection for generation of referring expressions challenge. [Introduction to Shared Task Evaluation Challenge.]
Anja Belz | Albert Gatt | Ehud Reiter | Jette Viethen
Proceedings of the Workshop on Using corpora for natural language generation

2006

pdf bib
GENEVAL: A Proposal for Shared-task Evaluation in NLG
Ehud Reiter | Anja Belz
Proceedings of the Fourth International Natural Language Generation Conference

pdf bib
Comparing Automatic and Human Evaluation of NLG Systems
Anja Belz | Ehud Reiter
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Generating Spatio-Temporal Descriptions in Pollen Forecasts
Ross Turner | Somayajulu Sripada | Ehud Reiter | Ian P Davy
Demonstrations

2005

pdf bib
Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG-05)
Graham Wilcock | Kristiina Jokinen | Chris Mellish | Ehud Reiter
Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG-05)

pdf bib
Evaluation of an NLG System using Post-Edit Data: Lessons Learnt
Somayajulu Sripada | Ehud Reiter | Lezan Hawizy
Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG-05)

pdf bib
Generating Readable Texts for Readers with Low Basic Skills
Sandra Williams | Ehud Reiter
Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG-05)

2003

pdf bib
Learning the Meaning and Usage of Time Phrases from a Parallel Text-Data Corpus
Ehud Reiter | Somayajulu Sripada
Proceedings of the HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-Linguistic Data

pdf bib
Proceedings of the 9th European Workshop on Natural Language Generation (ENLG-2003) at EACL 2003
Ehud Reiter | Helmut Horacek | Kees van Deemter
Proceedings of the 9th European Workshop on Natural Language Generation (ENLG-2003) at EACL 2003

pdf bib
Acquiring and Using Limited User Models in NLG
Ehud Reiter | Somayajulu Sripada | Sandra Williams
Proceedings of the 9th European Workshop on Natural Language Generation (ENLG-2003) at EACL 2003

pdf bib
Experiments with discourse-level choices and readability
Sandra Williams | Ehud Reiter | Liesl Osman
Proceedings of the 9th European Workshop on Natural Language Generation (ENLG-2003) at EACL 2003

pdf bib
Summarizing Neonatal Time Series Data
Somayajulu G. Sripada | Ehud Reiter | Jim Hunter | Jin Yu
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
Should Corpora Texts Be Gold Standards for NLG?
Ehud Reiter | Somayajulu Sripada
Proceedings of the International Natural Language Generation Conference

pdf bib
Squibs and Discussions: Human Variation and Lexical Choice
Ehud Reiter | Somayajulu Sripada
Computational Linguistics, Volume 28, Number 4, December 2002

2001

pdf bib
Using a Randomised Controlled Clinical Trial to Evaluate an NLG System
Ehud Reiter | Roma Robertson | A. Scott Lennox | Liesl Osman
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

pdf bib
A Two-Staged Model For Content Determination
Somayajula G. Sripada | Ehud Reiter | Jim Hunter | Jin Yu
Proceedings of the ACL 2001 Eighth European Workshop on Natural Language Generation (EWNLG)

2000

pdf bib
Pipelines and size constraints
Ehud Reiter
Computational Linguistics, Volume 26, Number 2, June 2000

pdf bib
Knowledge Acquisition for Natural Language Generation
Ehud Reiter | Roma Robertson | Liesl Osman
INLG’2000 Proceedings of the First International Conference on Natural Language Generation

1997

pdf bib
Customizable Descriptions of Object-Oriented Models
Benoit Lavoie | Owen Rambow | Ehud Reiter
Fifth Conference on Applied Natural Language Processing

pdf bib
Tailored Patient Information: Some Issues and Questions
Ehud Reiter
From Research to Commercial Applications: Making NLP Work in Practice

1996

pdf bib
The ModelExplainer
Benoit Lavoie | Owen Rambow | Ehud Reiter
Eighth International Natural Language Generation Workshop (Posters and Demonstrations)

1994

pdf bib
Has a Consensus NL Generation Architecture Appeared, and is it Psycholinguistically Plausible?
Ehud Reiter
Proceedings of the Seventh International Workshop on Natural Language Generation

1992

pdf bib
Using Classification to Generate Text
Ehud Reiter | Chris Mellish
30th Annual Meeting of the Association for Computational Linguistics

pdf bib
A Fast Algorithm for the Generation of Referring Expressions
Ehud Reiter | Robert Dale
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics

pdf bib
Automatic Generation of On-Line Documentation in the IDAS Project
Ehud Reiter | Chris Mellish | John Levine
Third Conference on Applied Natural Language Processing

1990

pdf bib
The Computational Complexity of Avoiding Conversational Implicatures
Ehud Reiter
28th Annual Meeting of the Association for Computational Linguistics

pdf bib
A New Model for Lexical Choice for Open-Class Words
Ehud Reiter
Proceedings of the Fifth International Workshop on Natural Language Generation