TaKG: A New Dataset for Paragraph-level Table-to-Text Generation Enhanced with Knowledge Graphs

Qianqian Qi, Zhenyun Deng, Yonghua Zhu, Lia Jisoo Lee, Michael Witbrock, Jiamou Liu


Abstract
We introduce TaKG, a new table-to-text generation dataset with the following highlights: (1) TaKG defines a long-text (paragraph-level) generation task as opposed to well-established short-text (sentence-level) generation datasets. (2) TaKG is the first large-scale dataset for this task, containing three application domains and ~750,000 samples. (3) To address the divergence phenomenon, TaKG enhances table input using external knowledge graphs, extracted by a new Wikidata-based method. We then propose a new Transformer-based multimodal sequence-to-sequence architecture for TaKG that integrates two pretrained language models RoBERTa and GPT-2. Our model shows reliable performance on long-text generation across a variety of metrics, and outperforms existing models for short-text generation tasks.
Anthology ID:
2022.findings-aacl.17
Volume:
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
Month:
November
Year:
2022
Address:
Online only
Editors:
Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
176–187
Language:
URL:
https://aclanthology.org/2022.findings-aacl.17
DOI:
10.18653/v1/2022.findings-aacl.17
Bibkey:
Cite (ACL):
Qianqian Qi, Zhenyun Deng, Yonghua Zhu, Lia Jisoo Lee, Michael Witbrock, and Jiamou Liu. 2022. TaKG: A New Dataset for Paragraph-level Table-to-Text Generation Enhanced with Knowledge Graphs. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 176–187, Online only. Association for Computational Linguistics.
Cite (Informal):
TaKG: A New Dataset for Paragraph-level Table-to-Text Generation Enhanced with Knowledge Graphs (Qi et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-aacl.17.pdf