Thiago D. Tadeu

Also published as: Thiago Tadeu

2010

SINotas: the Evaluation of a NLG Application
Roberto P. A. Araujo | Rafael L. de Oliveira | Eder M. de Novais | Thiago D. Tadeu | Daniel B. Pereira | Ivandré Paraboni
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

SINotas is a data-to-text NLG application intended to produce short textual reports on students academic performance from a database conveying their grades, weekly attendance rates and related academic information. Although developed primarily as a testbed for Portuguese Natural Language Generation, SINotas generates reports of interest to both students keen to learn how their professors would describe their efforts, and to the professors themselves, who may benefit from an at-a-glance view of the students performance. In a traditional machine learning approach, SINotas uses a data-text aligned corpus as training data for decision-tree induction. The current system comprises a series of classifiers that implement major Document Planning subtasks (namely, data interpretation, content selection, within- and between-sentence structuring), and a small surface realisation grammar of Brazilian Portuguese. In this paper we focus on the evaluation work of the system, applying a number of intrinsic and user-based evaluation metrics to a collection of text reports generated from real application data.

pdf bib

Text Generation for Brazilian Portuguese: the Surface Realization Task
Eder Novais | Thiago Tadeu | Ivandré Paraboni
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

pdf bib abs

Extracting Surface Realisation Templates from Corpora
Thiago D. Tadeu | Eder M. de Novais | Ivandré Paraboni
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In Natural Language Generation (NLG), template-based surface realisation is an effective solution to the problem of producing surface strings from a given semantic representation, but many applications may not be able to provide the input knowledge in the required level of detail, which in turn may limit the use of the available NLG resources. However, if we know in advance what the most likely output sentences are (e.g., because a corpus on the relevant application domain happens to be available), then corpus knowledge may be used to quickly deploy a surface realisation engine for small-scale applications, for which it may be sufficient to select a sentence (in natural language) that resembles the desired output, and then modify some or all of its constituents accordingly. In other words, the application may simply 'point to' an existing sentence in the corpus and specify only the changes that need to take place to obtain the desired surface string. In this paper we describe one such approach to surface realisation, in which we extract syntactically-structured templates from a target corpus, and use these templates to produce existing and modified versions of the target sentences by a combination of canned text and basic dependency-tree operations.

Co-authors

Rafael L. de Oliveira 1

Venues

LREC2
WS1

Fix author