Stamatia Dasiopoulou

2023

Generating Irish Text with a Flexible Plug-and-Play Architecture
Simon Mille | Elaine Uí Dhonnchadha | Lauren Cassidy | Brian Davis | Stamatia Dasiopoulou | Anya Belz
Proceedings of the 2nd Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning

In this paper, we describe M-FleNS, a multilingual flexible plug-and-play architecture designed to accommodate neural and symbolic modules, and initially instantiated with rule-based modules. We focus on using M-FleNS for the specific purpose of building new resources for Irish, a language currently under-represented in the NLP landscape. We present the general M-FleNS framework and how we use it to build an Irish Natural Language Generation system for verbalising part of the DBpedia ontology and building a multilayered dataset with rich linguistic annotations. Via automatic and human assessments of the output texts we show that with very limited resources we are able to create a system that reaches high levels of fluency and semantic accuracy, while having very low energy and memory requirements.

pdf bib abs

DCU/TCD-FORGe at WebNLG’23: Irish rules! (WegNLG 2023)
Simon Mille | Elaine Uí Dhonnchadha | Stamatia Dasiopoulou | Lauren Cassidy | Brian Davis | Anya Belz
Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023)

In this paper, we describe the submission of Dublin City University (DCU) and Trinity College Dublin (TCD) for the WebNLG 2023 shared task. We present a fully rule-based pipeline for generating Irish texts from DBpedia triple sets which comprises 4 components: triple lexicalisation, generation of noninflected Irish text, inflection generation, and post-processing.

pdf bib abs

Mod-D2T: A Multi-layer Dataset for Modular Data-to-Text Generation
Simon Mille | Francois Lareau | Stamatia Dasiopoulou | Anya Belz
Proceedings of the 16th International Natural Language Generation Conference

Rule-based text generators lack the coverage and fluency of their neural counterparts, but have two big advantages over them: (i) they are entirely controllable and do not hallucinate; and (ii) they can fully explain how an output was generated from an input. In this paper we leverage these two advantages to create large and reliable synthetic datasets with multiple human-intelligible intermediate representations. We present the Modular Data-to-Text (Mod-D2T) Dataset which incorporates ten intermediate-level representations between input triple sets and output text; the mappings from one level to the next can broadly be interpreted as the traditional modular tasks of an NLG pipeline. We describe the Mod-D2T dataset, evaluate its quality via manual validation and discuss its applications and limitations. Data, code and documentation are available at https://github.com/mille-s/Mod-D2T.

2019

pdf bib abs

Teaching FORGe to Verbalize DBpedia Properties in Spanish
Simon Mille | Stamatia Dasiopoulou | Beatriz Fisas | Leo Wanner
Proceedings of the 12th International Conference on Natural Language Generation

Statistical generators increasingly dominate the research in NLG. However, grammar-based generators that are grounded in a solid linguistic framework remain very competitive, especially for generation from deep knowledge structures. Furthermore, if built modularly, they can be ported to other genres and languages with a limited amount of work, without the need of the annotation of a considerable amount of training data. One of these generators is FORGe, which is based on the Meaning-Text Model. In the recent WebNLG challenge (the first comprehensive task addressing the mapping of RDF triples to text) FORGe ranked first with respect to the overall quality in human evaluation. We extend the coverage of FORGE’s open source grammatical and lexical resources for English, so as to further improve the English texts, and port them to Spanish, to achieve a comparable quality. This confirms that, as already observed in the case of SimpleNLG, a robust universal grammar-driven framework and a systematic organization of the linguistic resources can be an adequate choice for NLG applications.

Co-authors

Beatriz Fisas 1

François Lareau 1

Leo Wanner 1

Venues

Fix author