uFACT: Unfaithful Alien-Corpora Training for Semantically Consistent Data-to-Text Generation

Tisha Anders; Alexandru Coca; Bill Byrne

doi:10.18653/v1/2022.findings-acl.223

uFACT: Unfaithful Alien-Corpora Training for Semantically Consistent Data-to-Text Generation

Tisha Anders, Alexandru Coca, Bill Byrne

Abstract

We propose uFACT (Un-Faithful Alien Corpora Training), a training corpus construction method for data-to-text (d2t) generation models. We show that d2t models trained on uFACT datasets generate utterances which represent the semantic content of the data sources more accurately compared to models trained on the target corpus alone. Our approach is to augment the training set of a given target corpus with alien corpora which have different semantic representations. We show that while it is important to have faithful data from the target corpus, the faithfulness of additional corpora only plays a minor role. Consequently, uFACT datasets can be constructed with large quantities of unfaithful data. We show how uFACT can be leveraged to obtain state-of-the-art results on the WebNLG benchmark using METEOR as our performance metric. Furthermore, we investigate the sensitivity of the generation faithfulness to the training corpus structure using the PARENT metric, and provide a baseline for this metric on the WebNLG (Gardent et al., 2017) benchmark to facilitate comparisons with future work.

Anthology ID:: 2022.findings-acl.223
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2836–2841
Language:
URL:: https://aclanthology.org/2022.findings-acl.223/
DOI:: 10.18653/v1/2022.findings-acl.223
Bibkey:
Cite (ACL):: Tisha Anders, Alexandru Coca, and Bill Byrne. 2022. uFACT: Unfaithful Alien-Corpora Training for Semantically Consistent Data-to-Text Generation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2836–2841, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: uFACT: Unfaithful Alien-Corpora Training for Semantically Consistent Data-to-Text Generation (Anders et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-acl.223.pdf
Video:: https://aclanthology.org/2022.findings-acl.223.mp4
Data: ViGGO

PDF Cite Search Video Fix data