International Natural Language Generation Conference (2017)
In this paper, we study AMR-to-text generation, framing it as a translation task and comparing two different MT approaches (Phrase-based and Neural MT). We systematically study the effects of 3 AMR preprocessing steps (Delexicalisation, Compression, and Linearisation) applied before the MT phase. Our results show that preprocessing indeed helps, although the benefits differ for the two MT models.
Poetry generation is becoming popular among researchers of Natural Language Generation, Computational Creativity and, broadly, Artificial Intelligence. To produce text that may be regarded as poetry, poetry generation systems are typically knowledge-intensive and have to deal with several levels of language, from lexical to semantics. Interest on the topic resulted in the development of several poetry generators described in the literature, with different features covered or handled differently, by a broad range of alternative approaches, as well as different perspectives on evaluation, another challenging aspect due the underlying subjectivity. This paper surveys intelligent poetry generators around a set of relevant axis for poetry generation – targeted languages, form and content features, techniques, reutilisation of material, and evaluation – and aims to organise work developed on this topic so far.
Automatic image description systems are commonly trained and evaluated on large image description datasets. Recently, researchers have started to collect such datasets for languages other than English. An unexplored question is how different these datasets are from English and, if there are any differences, what causes them to differ. This paper provides a cross-linguistic comparison of Dutch, English, and German image descriptions. We find that these descriptions are similar in many respects, but the familiarity of crowd workers with the subjects of the images has a noticeable influence on the specificity of the descriptions.
We study the task of constructing sports news report automatically from live commentary and focus on content selection. Rather than receiving every piece of text of a sports match before news construction, as in previous related work, we novelly verify the feasibility of a more challenging but more useful setting to generate news report on the fly by treating live text input as a stream. Specifically, we design various scoring functions to address different requirements of the task. The near submodularity of scoring functions makes it possible to adapt efficient greedy algorithms even in stream data settings. Experiments suggest that our proposed framework can already produce comparable results compared with previous work that relies on a supervised learning-to-rank model with heavy feature engineering.
We present a flexible Natural Language Generation approach for Spanish, focused on the surface realisation stage, which integrates an inflection module in order to improve the naturalness and expressivity of the generated language. This inflection module inflects the verbs using an ensemble of trainable algorithms whereas the other types of words (e.g. nouns, determiners, etc) are inflected using hand-crafted rules. We show that our approach achieves 2% higher accuracy than two state-of-art inflection generation approaches. Furthermore, our proposed approach also predicts an extra feature: the inflection of the imperative mood, which was not taken into account by previous work. We also present a user evaluation, where we demonstrate that the proposed method significantly improves the perceived naturalness of the generated language.
Image captioning has evolved into a core task for Natural Language Generation and has also proved to be an important testbed for deep learning approaches to handling multimodal representations. Most contemporary approaches rely on a combination of a convolutional network to handle image features, and a recurrent network to encode linguistic information. The latter is typically viewed as the primary “generation” component. Beyond this high-level characterisation, a CNN+RNN model supports a variety of architectural designs. The dominant model in the literature is one in which visual features encoded by a CNN are “injected” as part of the linguistic encoding process, driving the RNN’s linguistic choices. By contrast, it is possible to envisage an architecture in which visual and linguistic features are encoded separately, and merged at a subsequent stage. In this paper, we address two related questions: (1) Is direct injection the best way of combining multimodal information, or is a late merging alternative better for the image captioning task? (2) To what extent should a recurrent network be viewed as actually generating, rather than simply encoding, linguistic information?
Describing people and characters can be very useful in different contexts, such as computational narrative or image description for the visually impaired. However, a review of the existing literature shows that the automatic generation of people descriptions has not received much attention. Our work focuses on the description of people in snapshots from a 3D environment. First, we have conducted a survey to identify the way in which people describe other people under different conditions. We have used the information extracted from this survey to design several Referring Expression Generation algorithms which produce similar results. We have evaluated these algorithms with users in order to identify which ones generate the best description for specific characters in different situations. The evaluation has shown that, in order to generate good descriptions, a combination of different algorithms has to be used depending on the features and situation of the person to be described.
Co-PoeTryMe is a web application for poetry composition, guided by the user, though with the help of automatic features, such as the generation of full (editable) drafts, as well as the acquisition of additional well-formed lines, or semantically-related words, possibly constrained by the number of syllables, rhyme, or polarity. Towards the final poem, the latter can replace lines or words in the draft.
Current referring expression generation systems mostly deliver their output as one-shot, written expressions. We present on-going work on incremental generation of spoken expressions referring to objects in real-world images. This approach extends upon previous work using the words-as-classifier model for generation. We implement this generator in an incremental dialogue processing framework such that we can exploit an existing interface to incremental text-to-speech synthesis. Our system generates and synthesizes referring expressions while continuously observing non-verbal user reactions.
This talk will present a few NLG systems developed within Thomson Reuters providing information to professionals such as lawyers, accountants or traders. Based on the experience developing these system, I will discuss the usefulness of automatic metrics, crowd-sourced evaluation, corpora studies and expert reviews. I will conclude with exploring the question of whether developers of NLG systems need to follow ethical guidelines and how those guidelines could be established.
For situated agents to effectively engage in natural-language interactions with humans, they must be able to refer to entities such as people, locations, and objects. While classic referring expression generation (REG) algorithms like the Incremental Algorithm (IA) assume perfect, complete, and accessible knowledge of all referents, this is not always possible. In this work, we show how a previously presented consultant framework (which facilitates reference resolution when knowledge is uncertain, heterogeneous and distributed) can be used to extend the IA to produce DIST-PIA, a domain-independent algorithm for REG under uncertain, heterogeneous, and distributed knowledge. We also present a novel framework that can be used to evaluate such REG algorithms without conflating the performance of the algorithm with the performance of classifiers it employs.
There has been continuous growth in the volume and ubiquity of video material. It has become essential to define video semantics in order to aid the searchability and retrieval of this data. We present a framework that produces textual descriptions of video, based on the visual semantic content. Detected action classes rendered as verbs, participant objects converted to noun phrases, visual properties of detected objects rendered as adjectives and spatial relations between objects rendered as prepositions. Further, in cases of zero-shot action recognition, a language model is used to infer a missing verb, aided by the detection of objects and scene settings. These extracted features are converted into textual descriptions using a template-based approach. The proposed video descriptions framework evaluated on the NLDHA dataset using ROUGE scores and human judgment evaluation.
We present PASS, a data-to-text system that generates Dutch soccer reports from match statistics. One of the novel elements of PASS is the fact that the system produces corpus-based texts tailored towards fans of one club or the other, which can most prominently be observed in the tone of voice used in the reports. Furthermore, the system is open source and uses a modular design, which makes it relatively easy for people to add extensions. Human-based evaluation shows that people are generally positive towards PASS in regards to its clarity and fluency, and that the tailoring is accurately recognized in most cases.
Natural Language Generation (NLG) can be used to generate personalized health information, which is especially useful when provided in one’s own language. However, the NLG technique widely used in different domains and languages—templates—was shown to be inapplicable to Bantu languages, due to their characteristic agglutinative structure. We present here our use of the grammar engine NLG technique to generate text in Runyankore, a Bantu language indigenous to Uganda. Our grammar engine adds to previous work in this field with new rules for cardinality constraints, prepositions in roles, the passive, and phonological conditioning. We evaluated the generated text with linguists and non-linguists, who regarded most text as grammatically correct and understandable; and over 60% of them regarded all the text generated by our system to have been authored by a human being.
We use language to talk about the world, and so reference is a crucial property of language. However, modeling reference is particularly difficult, as it involves both continuous and discrete as-pects of language. For instance, referring expressions like “the big mug” or “it” typically contain content words (“big”, “mug”), which are notoriously fuzzy or vague in their meaning, and also fun-ction words (“the”, “it”) that largely serve as discrete pointers. Data-driven, distributed models based on distributional semantics or deep learning excel at the former, but struggle with the latter, and the reverse is true for symbolic models. I present ongoing work on modeling reference with a distribu-ted model aimed at capturing both aspects, and learns to refer directly from reference acts.
We propose a new shared task for tactical data-to-text generation in the domain of source code libraries. Specifically, we focus on text generation of function descriptions from example software projects. Data is drawn from existing resources used for studying the related problem of semantic parser induction, and spans a wide variety of both natural languages and programming languages. In this paper, we describe these existing resources, which will serve as training and development data for the task, and discuss plans for building new independent test sets.
We propose a shared task on multilingual Surface Realization, i.e., on mapping unordered and uninflected universal dependency trees to correctly ordered and inflected sentences in a number of languages. A second deeper input will be available in which, in addition, functional words, fine-grained PoS and morphological information will be removed from the input trees. The first shared task on Surface Realization was carried out in 2011 with a similar setup, with a focus on English. We think that it is time for relaunching such a shared task effort in view of the arrival of Universal Dependencies annotated treebanks for a large number of languages on the one hand, and the increasing dominance of Deep Learning, which proved to be a game changer for NLP, on the other hand.
The WebNLG challenge consists in mapping sets of RDF triples to text. It provides a common benchmark on which to train, evaluate and compare “microplanners”, i.e. generation systems that verbalise a given content by making a range of complex interacting choices including referring expression generation, aggregation, lexicalisation, surface realisation and sentence segmentation. In this paper, we introduce the microplanning task, describe data preparation, introduce our evaluation methodology, analyse participant results and provide a brief description of the participating systems.
I briefly describe some of the commercial work which XXX is doing in referring expression algorithms, and highlight differences between what is commercially important (at least to XXX) and the NLG research literature. In particular, XXX is less interested in generic reference algorithms than in high-quality algorithms for specific types of references, such as components of machines, named entities, and dates.
Integrating surface realization and the generation of referring expressions into a single algorithm can improve the quality of the generated sentences. Existing algorithms for doing this, such as SPUD and CRISP, are search-based and can be slow or incomplete. We offer a chart-based algorithm for integrated sentence generation and demonstrate its runtime efficiency.
We describe SimpleNLG-ES, an adaptation of the SimpleNLG realization library for the Spanish language. Our implementation is based on the bilingual English-French SimpleNLG-EnFr adaptation. The library has been tested using a battery of examples that ensure that the most common syntax, morphology and orthography rules for Spanish are met. The library is currently being used in three different projects for the development of data-to-text systems in the meteorological, statistical data information, and business intelligence application domains.
Corpora of referring expressions elicited from human participants in a controlled environment are an important resource for research on automatic referring expression generation. We here present G-TUNA, a new corpus of referring expressions for German. Using the furniture stimuli set developed for the TUNA and D-TUNA corpora, our corpus extends on these corpora by providing data collected in a simulated driving dual-task setting, and additionally provides exact duration annotations for the spoken referring expressions. This corpus will hence allow researchers to analyze the interaction between referring expression length and speech rate, under conditions where the listener is under high vs. low cognitive load.
There are many domain-specific and language-specific NLG systems, of which it may be possible to adapt to related domains and languages. The languages in the Bantu language family have their own set of features distinct from other major groups, which therefore severely limits the options to bootstrap an NLG system from existing ones. We present here our first proof-of-concept application for knowledge-to-text NLG as a plugin to the Protege 5.x ontology development system, tailored to Runyankore, a Bantu language indigenous to Uganda. It comprises a basic annotation model for linguistic information such as noun class, an implementation of existing verbalisation rules and a CFG for verbs, and a basic interface for data entry.
A fully fledged practical working application for a rule-based NLG system is presented that is able to create non-trivial, human sounding narrative from structured data, in any language and for any topic.
We present two approaches to generate titles for browse pages in five different languages, namely English, German, French, Italian and Spanish. These browse pages are structured search pages in an e-commerce domain. We first present a rule-based approach to generate these browse page titles. In addition, we also present a hybrid approach which uses a phrase-based statistical machine translation engine on top of the rule-based system to assemble the best title. For the two languages English and German we have access to a large amount of already available rule-based generated and curated titles. For these languages we present an automatic post-editing approach which learns how to post-edit the rule-based titles into curated titles.
Data-to-text generation is very essential and important in machine writing applications. The recent deep learning models, like Recurrent Neural Networks (RNNs), have shown a bright future for relevant text generation tasks. However, rare work has been done for automatic generation of long reviews from user opinions. In this paper, we introduce a deep neural network model to generate long Chinese reviews from aspect-sentiment scores representing users’ opinions. We conduct our study within the framework of encoder-decoder networks, and we propose a hierarchical structure with aligned attention in the Long-Short Term Memory (LSTM) decoder. Experiments show that our model outperforms retrieval based baseline methods, and also beats the sequential generation models in qualitative evaluations.
Most work on automatic generation of narratives, and more specifically suspenseful narrative, has focused on detailed domain-specific modelling of character psychology and plot structure. Recent work in computational linguistics on the automatic learning of narrative schemas suggests an alternative approach that exploits such schemas as a starting point for modelling and measuring suspense. We propose a domain-independent model for tracking suspense in a story which can be used to predict the audience’s suspense response on a sentence-by-sentence basis at the content determination stage of narrative generation. The model lends itself as the theoretical foundation for a suspense module that is compatible with alternative narrative generation theories. The proposal is evaluated by human judges’ normalised average scores correlate strongly with predicted values.
Despite increasing amounts of data and ever improving natural language generation techniques, work on automated journalism is still relatively scarce. In this paper, we explore the field and challenges associated with building a journalistic natural language generation system. We present a set of requirements that should guide system design, including transparency, accuracy, modifiability and transferability. Guided by the requirements, we present a data-driven architecture for automated journalism that is largely domain and language independent. We illustrate its practical application in the production of news articles about the 2017 Finnish municipal elections in three languages, demonstrating the successfulness of the data-driven, modular approach of the design. We then draw some lessons for future automated journalism.
Data augmentation is widely used to train deep neural networks for image classification tasks. Simply flipping images can help learning tremendously by increasing the number of training images by a factor of two. However, little work has been done studying data augmentation in natural language processing. Here, we describe two methods for data augmentation for Visual Question Answering (VQA). The first uses existing semantic annotations to generate new questions. The second method is a generative approach using recurrent neural networks. Experiments show that the proposed data augmentation improves performance of both baseline and state-of-the-art VQA algorithms.
This work proposes an organization of knowledge to facilitate the generation of personalized questions, answers and grammars from web documents. To reduce the human effort needed in the generation of the linguistic resources for a new domain, the general aspects that can be reuse across domains are separated from those more specific. The proposed approach is based on the representation of the main domain concepts as a set of attributes. These attributes are related to a syntactico-semantic taxonomy representing the general relationships between conceptual and linguistic knowledge. User models are incorporated by distinguishing different user groups and relating each group to the appropriate conceptual attributes. Then, the data is extracted from the web documents and represented as instances of the domain concepts. Questions, answers and grammars are generated from these instances.
We compare several language models for the word-ordering task and propose a new bag-to-sequence neural model based on attention-based sequence-to-sequence models. We evaluate the model on a large German WMT data set where it significantly outperforms existing models. We also describe a novel search strategy for LM-based word ordering and report results on the English Penn Treebank. Our best model setup outperforms prior work both in terms of speed and quality.
East Asian languages are thought to handle reference differently from languages such as English, particularly in terms of the marking of definiteness and number. We present the first Data-Text corpus for Referring Expressions in Mandarin, and we use this corpus to test some initial hypotheses inspired by the theoretical linguistics literature. Our findings suggest that function words deserve more attention in Referring Expressions Generation than they have so far received, and they have a bearing on the debate about whether different languages make different trade-offs between clarity and brevity.
We propose sentence chunking as a way to reduce the time and memory costs of realization of long sentences. During chunking we divide the semantic representation of a sentence into smaller components which can be processed and recombined without loss of information. Our meaning representation of choice is the Dependency Minimal Recursion Semantics (DMRS). We show that realizing chunks of a sentence and combining the results of such realizations increases the coverage for long sentences, significantly reduces the resources required and does not affect the quality of the realization.
Every time we buy something online, we are confronted with Terms of Services. However, only a few people actually read these terms, before accepting them, often to their disadvantage. In this paper, we present the SaToS browser plugin which summarises and simplifies Terms of Services from German webshops.
Many data-to-text NLG systems work with data sets which are incomplete, ie some of the data is missing. We have worked with data journalists to understand how they describe incomplete data, and are building NLG algorithms based on these insights. A pilot evaluation showed mixed results, and highlighted several areas where we need to improve our system.
Referring expression generation (REG) models that use speaker-dependent information require a considerable amount of training data produced by every individual speaker, or may otherwise perform poorly. In this work we propose a simple personalised method for this task, in which speakers are grouped into profiles according to their referential behaviour. Intrinsic evaluation shows that the use of speaker’s profiles generally outperforms the personalised method found in previous work.
A generation system can only be as good as the data it is trained on. In this short paper, we propose a methodology for analysing data-to-text corpora used for training Natural Language Generation (NLG) systems. We apply this methodology to three existing benchmarks. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text generators.
Monitoring and analysis of complex phenomena attract the attention of both academy and industry. Dealing with data produced by complex phenomena requires the use of advance computational intelligence techniques. Namely, linguistic description of complex phenomena constitutes a mature research line. It is supported by the Computational Theory of Perceptions grounded on the Fuzzy Sets Theory. Its aim is the development of computational systems with the ability to generate vague descriptions of the world in a similar way how humans do. This is a human-centric and multi-disciplinary research work. Moreover, its success is a matter of careful design; thus, developers play a key role. The rLDCP R package was designed to facilitate the development of new applications. This demo introduces the use of rLDCP, for both beginners and advance developers, in practical use cases.
This demo paper presents the multilingual deep sentence generator developed by the TALN group at Universitat Pompeu Fabra, implemented as a series of rule-based graph-transducers for the syntacticization of the input graphs, the resolution of morphological agreements, and the linearization of the trees.
We introduce the properties to be satisfied by measures of referential success of set referring expressions with fuzzy properties. We define families of measures on the basis of n-cardinality measures and we illustrate some of them with a toy example.
We present a neural response generation model that generates responses conditioned on a target personality. The model learns high level features based on the target personality, and uses them to update its hidden state. Our model achieves performance improvements in both perplexity and BLEU scores over a baseline sequence-to-sequence model, and is validated by human judges.
Progress in statistical paraphrase generation has been hindered for a long time by the lack of large monolingual parallel corpora. In this paper, we adapt the neural machine translation approach to paraphrase generation and perform transfer learning from the closely related task of entailment generation. We evaluate the model on the Microsoft Research Paraphrase (MSRP) corpus and show that the model is able to generate sentences that capture part of the original meaning, but fails to pick up on important words or to show large lexical variation.