Dimitra Gkatzia


2022

pdf bib
Task2Dial: A Novel Task and Dataset for Commonsense-enhanced Task-based Dialogue Grounded in Documents
Carl Strathearn | Dimitra Gkatzia
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

This paper proposes a novel task on commonsense-enhanced task-based dialogue grounded in documents and describes the Task2Dial dataset, a novel dataset of document-grounded task-based dialogues, where an Information Giver (IG) provides instructions (by consulting a document) to an Information Follower (IF), so that the latter can successfully complete the task. In this unique setting, the IF can ask clarification questions which may not be grounded in the underlying document and require commonsense knowledge to be answered. The Task2Dial dataset poses new challenges: (1) its human reference texts show more lexical richness and variation than other document-grounded dialogue datasets; (2) generating from this set requires paraphrasing as instructional responses might have been modified from the underlying document; (3) requires commonsense knowledge, since questions might not necessarily be grounded in the document; (4) generating requires planning based on context, as task steps need to be provided in order. The Task2Dial dataset contains dialogues with an average 18.15 number of turns and 19.79 tokens per turn, as compared to 12.94 and 12 respectively in existing datasets. As such, learning from this dataset promises more natural, varied and less template-like system utterances.

2021

pdf bib
Task2Dial Dataset: A Novel Dataset for Commonsense-enhanced Task-based Dialogue Grounded in Documents
Carl Strathearn | Dimitra Gkatzia
Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)

pdf bib
Chefbot: A Novel Framework for the Generation of Commonsense-enhanced Responses for Task-based Dialogue Systems
Carl Strathearn | Dimitra Gkatzia
Proceedings of the 14th International Conference on Natural Language Generation

Conversational systems aim to generate responses that are accurate, relevant and engaging, either through utilising neural end-to-end models or through slot filling. Human-to-human conversations are enhanced by not only the latest utterance of the interlocutor, but also by recalling relevant information about concepts/objects covered in the dialogue and integrating them into their responses. Such information may contain recent referred concepts, commonsense knowledge and more. A concrete scenario of such dialogues is the cooking scenario, i.e. when an artificial agent (personal assistant, robot, chatbot) and a human converse about a recipe. We will demo a novel system for commonsense enhanced response generation in the scenario of cooking, where the conversational system is able to not only provide directions for cooking step-by-step, but also display commonsense capabilities by offering explanations of how objects can be used and provide recommendations for replacing ingredients.

pdf bib
Underreporting of errors in NLG output, and what to do about it
Emiel van Miltenburg | Miruna Clinciu | Ondřej Dušek | Dimitra Gkatzia | Stephanie Inglis | Leo Leppänen | Saad Mahamood | Emma Manning | Stephanie Schoch | Craig Thomson | Luou Wen
Proceedings of the 14th International Conference on Natural Language Generation

We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by ‘state-of-the-art’ research. Next to quantifying the extent of error under-reporting, this position paper provides recommendations for error identification, analysis and reporting.

pdf bib
It’s Commonsense, isn’t it? Demystifying Human Evaluations in Commonsense-Enhanced NLG Systems
Miruna-Adriana Clinciu | Dimitra Gkatzia | Saad Mahamood
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

Common sense is an integral part of human cognition which allows us to make sound decisions, communicate effectively with others and interpret situations and utterances. Endowing AI systems with commonsense knowledge capabilities will help us get closer to creating systems that exhibit human intelligence. Recent efforts in Natural Language Generation (NLG) have focused on incorporating commonsense knowledge through large-scale pre-trained language models or by incorporating external knowledge bases. Such systems exhibit reasoning capabilities without common sense being explicitly encoded in the training set. These systems require careful evaluation, as they incorporate additional resources during training which adds additional sources of errors. Additionally, human evaluation of such systems can have significant variation, making it impossible to compare different systems and define baselines. This paper aims to demystify human evaluations of commonsense-enhanced NLG systems by proposing the Commonsense Evaluation Card (CEC), a set of recommendations for evaluation reporting of commonsense-enhanced NLG systems, underpinned by an extensive analysis of human evaluations reported in the recent literature.

pdf bib
CAPE: Context-Aware Private Embeddings for Private Language Learning
Richard Plant | Dimitra Gkatzia | Valerio Giuffrida
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Neural language models have contributed to state-of-the-art results in a number of downstream applications including sentiment analysis, intent classification and others. However, obtaining text representations or embeddings using these models risks encoding personally identifiable information learned from language and context cues that may lead to privacy leaks. To ameliorate this issue, we propose Context-Aware Private Embeddings (CAPE), a novel approach which combines differential privacy and adversarial learning to preserve privacy during training of embeddings. Specifically, CAPE firstly applies calibrated noise through differential privacy to maintain the privacy of text representations by preserving the encoded semantic links while obscuring sensitive information. Next, CAPE employs an adversarial training regime that obscures identified private variables. Experimental results demonstrate that our proposed approach is more effective in reducing private information leakage than either single intervention, with approximately a 3% reduction in attacker performance compared to the best-performing current method.

pdf bib
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Dimitra Gkatzia | Djamé Seddah
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

2020

pdf bib
Improving the Naturalness and Diversity of Referring Expression Generation models using Minimum Risk Training
Nikolaos Panagiaris | Emma Hart | Dimitra Gkatzia
Proceedings of the 13th International Conference on Natural Language Generation

In this paper we consider the problem of optimizing neural Referring Expression Generation (REG) models with sequence level objectives. Recently reinforcement learning (RL) techniques have been adopted to train deep end-to-end systems to directly optimize sequence-level objectives. However, there are two issues associated with RL training: (1) effectively applying RL is challenging, and (2) the generated sentences lack in diversity and naturalness due to deficiencies in the generated word distribution, smaller vocabulary size, and repetitiveness of frequent words and phrases. To alleviate these issues, we propose a novel strategy for training REG models, using minimum risk training (MRT) with maximum likelihood estimation (MLE) and we show that our approach outperforms RL w.r.t naturalness and diversity of the output. Specifically, our approach achieves an increase in CIDEr scores between 23%-57% in two datasets. We further demonstrate the robustness of the proposed method through a detailed comparison with different REG models.

pdf bib
Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions
David M. Howcroft | Anya Belz | Miruna-Adriana Clinciu | Dimitra Gkatzia | Sadid A. Hasan | Saad Mahamood | Simon Mille | Emiel van Miltenburg | Sashank Santhanam | Verena Rieser
Proceedings of the 13th International Conference on Natural Language Generation

Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility. In this paper, we present (i) our dataset of 165 NLG papers with human evaluations, (ii) the annotation scheme we developed to label the papers for different aspects of evaluations, (iii) quantitative analyses of the annotations, and (iv) a set of recommendations for improving standards in evaluation reporting. We use the annotations as a basis for examining information included in evaluation reports, and levels of consistency in approaches, experimental design and terminology, focusing in particular on the 200+ different terms that have been used for evaluated aspects of quality. We conclude that due to a pervasive lack of clarity in reports and extreme diversity in approaches, human evaluation in NLG presents as extremely confused in 2020, and that the field is in urgent need of standard methods and terminology.

pdf bib
Proceedings of the 1st Workshop on Evaluating NLG Evaluation
Shubham Agarwal | Ondřej Dušek | Sebastian Gehrmann | Dimitra Gkatzia | Ioannis Konstas | Emiel Van Miltenburg | Sashank Santhanam
Proceedings of the 1st Workshop on Evaluating NLG Evaluation

2018

pdf bib
Proceedings of the Workshop on NLG for Human–Robot Interaction
Mary Ellen Foster | Hendrik Buschmeier | Dimitra Gkatzia
Proceedings of the Workshop on NLG for Human–Robot Interaction

pdf bib
Learning from limited datasets: Implications for Natural Language Generation and Human-Robot Interaction
Jekaterina Belakova | Dimitra Gkatzia
Proceedings of the Workshop on NLG for Human–Robot Interaction

One of the most natural ways for human robot communication is through spoken language. Training human-robot interaction systems require access to large datasets which are expensive to obtain and labour intensive. In this paper, we describe an approach for learning from minimal data, using as a toy example language understanding in spoken dialogue systems. Understanding of spoken language is crucial because it has implications for natural language generation, i.e. correctly understanding a user’s utterance will lead to choosing the right response/action. Finally, we discuss implications for Natural Language Generation in Human-Robot Interaction.

2017

pdf bib
Improving the Naturalness and Expressivity of Language Generation for Spanish
Cristina Barros | Dimitra Gkatzia | Elena Lloret
Proceedings of the 10th International Conference on Natural Language Generation

We present a flexible Natural Language Generation approach for Spanish, focused on the surface realisation stage, which integrates an inflection module in order to improve the naturalness and expressivity of the generated language. This inflection module inflects the verbs using an ensemble of trainable algorithms whereas the other types of words (e.g. nouns, determiners, etc) are inflected using hand-crafted rules. We show that our approach achieves 2% higher accuracy than two state-of-art inflection generation approaches. Furthermore, our proposed approach also predicts an extra feature: the inflection of the imperative mood, which was not taken into account by previous work. We also present a user evaluation, where we demonstrate that the proposed method significantly improves the perceived naturalness of the generated language.

pdf bib
Inflection Generation for Spanish Verbs using Supervised Learning
Cristina Barros | Dimitra Gkatzia | Elena Lloret
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We present a novel supervised approach to inflection generation for verbs in Spanish. Our system takes as input the verb’s lemma form and the desired features such as person, number, tense, and is able to predict the appropriate grammatical conjugation. Even though our approach learns from fewer examples comparing to previous work, it is able to deal with all the Spanish moods (indicative, subjunctive and imperative) in contrast to previous work which only focuses on indicative and subjunctive moods. We show that in an intrinsic evaluation, our system achieves 99% accuracy, outperforming (although not significantly) two competitive state-of-art systems. The successful results obtained clearly indicate that our approach could be integrated into wider approaches related to text generation in Spanish.

2016

pdf bib
Proceedings of the 9th International Natural Language Generation conference
Amy Isard | Verena Rieser | Dimitra Gkatzia
Proceedings of the 9th International Natural Language Generation conference

pdf bib
Natural Language Generation enhances human decision-making with uncertain information
Dimitra Gkatzia | Oliver Lemon | Verena Rieser
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
The REAL Corpus: A Crowd-Sourced Corpus of Human Generated and Evaluated Spatial References to Real-World Urban Scenes
Phil Bartie | William Mackaness | Dimitra Gkatzia | Verena Rieser
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Our interest is in people’s capacity to efficiently and effectively describe geographic objects in urban scenes. The broader ambition is to develop spatial models capable of equivalent functionality able to construct such referring expressions. To that end we present a newly crowd-sourced data set of natural language references to objects anchored in complex urban scenes (In short: The REAL Corpus ― Referring Expressions Anchored Language). The REAL corpus contains a collection of images of real-world urban scenes together with verbal descriptions of target objects generated by humans, paired with data on how successful other people were able to identify the same object based on these descriptions. In total, the corpus contains 32 images with on average 27 descriptions per image and 3 verifications for each description. In addition, the corpus is annotated with a variety of linguistically motivated features. The paper highlights issues posed by collecting data using crowd-sourcing with an unrestricted input format, as well as using real-world urban scenes.

2015

pdf bib
From the Virtual to the RealWorld: Referring to Objects in Real-World Spatial Scenes
Dimitra Gkatzia | Verena Rieser | Phil Bartie | William Mackaness
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Snapshot of NLG Evaluation Practices 2005 - 2014
Dimitra Gkatzia | Saad Mahamood
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

pdf bib
Generating and Evaluating Landmark-Based Navigation Instructions in Virtual Environments
Amanda Cercas Curry | Dimitra Gkatzia | Verena Rieser
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

pdf bib
A Game-Based Setup for Data Collection and Task-Based Evaluation of Uncertain Information Presentation
Dimitra Gkatzia | Amanda Cercas Curry | Verena Rieser | Oliver Lemon
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

2014

pdf bib
Multi-adaptive Natural Language Generation using Principal Component Regression
Dimitra Gkatzia | Helen Hastie | Oliver Lemon
Proceedings of the 8th International Natural Language Generation Conference (INLG)

pdf bib
Finding middle ground? Multi-objective Natural Language Generation from time-series data
Dimitra Gkatzia | Helen Hastie | Oliver Lemon
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Comparing Multi-label Classification with Reinforcement Learning for Summarisation of Time-series Data
Dimitra Gkatzia | Helen Hastie | Oliver Lemon
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Generating Student Feedback from Time-Series Data Using Reinforcement Learning
Dimitra Gkatzia | Helen Hastie | Srinivasan Janarthanam | Oliver Lemon
Proceedings of the 14th European Workshop on Natural Language Generation