Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers

Sebastien Montella, Betty Fabre, Tanguy Urvoy, Johannes Heinecke, Lina Rojas-Barahona


Abstract
The task of verbalization of RDF triples has known a growth in popularity due to the rising ubiquity of Knowledge Bases (KBs). The formalism of RDF triples is a simple and efficient way to store facts at a large scale. However, its abstract representation makes it difficult for humans to interpret. For this purpose, the WebNLG challenge aims at promoting automated RDF-to-text generation. We propose to leverage pre-trainings from augmented data with the Transformer model using a data augmentation strategy. Our experiment results show a minimum relative increases of 3.73%, 126.05% and 88.16% in BLEU score for seen categories, unseen entities and unseen categories respectively over the standard training.
Anthology ID:
2020.webnlg-1.9
Volume:
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)
Month:
12
Year:
2020
Address:
Dublin, Ireland (Virtual)
Editors:
Thiago Castro Ferreira, Claire Gardent, Nikolai Ilinykh, Chris van der Lee, Simon Mille, Diego Moussallem, Anastasia Shimorina
Venue:
WebNLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
89–99
Language:
URL:
https://aclanthology.org/2020.webnlg-1.9
DOI:
Bibkey:
Cite (ACL):
Sebastien Montella, Betty Fabre, Tanguy Urvoy, Johannes Heinecke, and Lina Rojas-Barahona. 2020. Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers. In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), pages 89–99, Dublin, Ireland (Virtual). Association for Computational Linguistics.
Cite (Informal):
Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers (Montella et al., WebNLG 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.webnlg-1.9.pdf
Data
WebNLG