Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers

Sebastien Montella; Betty Fabre; Tanguy Urvoy; Johannes Heinecke; Lina M. Rojas Barahona

Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers

Sebastien Montella, Betty Fabre, Tanguy Urvoy, Johannes Heinecke, Lina Rojas-Barahona

Abstract

The task of verbalization of RDF triples has known a growth in popularity due to the rising ubiquity of Knowledge Bases (KBs). The formalism of RDF triples is a simple and efficient way to store facts at a large scale. However, its abstract representation makes it difficult for humans to interpret. For this purpose, the WebNLG challenge aims at promoting automated RDF-to-text generation. We propose to leverage pre-trainings from augmented data with the Transformer model using a data augmentation strategy. Our experiment results show a minimum relative increases of 3.73%, 126.05% and 88.16% in BLEU score for seen categories, unseen entities and unseen categories respectively over the standard training.

Anthology ID:: 2020.webnlg-1.9
Volume:: Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)
Month:: 12
Year:: 2020
Address:: Dublin, Ireland (Virtual)
Editors:: Thiago Castro Ferreira, Claire Gardent, Nikolai Ilinykh, Chris van der Lee, Simon Mille, Diego Moussallem, Anastasia Shimorina
Venue:: WebNLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 89–99
Language:
URL:: https://aclanthology.org/2020.webnlg-1.9
DOI:
Bibkey:
Cite (ACL):: Sebastien Montella, Betty Fabre, Tanguy Urvoy, Johannes Heinecke, and Lina Rojas-Barahona. 2020. Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers. In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), pages 89–99, Dublin, Ireland (Virtual). Association for Computational Linguistics.
Cite (Informal):: Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers (Montella et al., WebNLG 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.webnlg-1.9.pdf
Data: WebNLG

PDF Cite Search