XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages

Shivprasad Sagare, Tushar Abhishek, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, Vasudeva Varma


Abstract
Multiple business scenarios require an automated generation of descriptive human-readable text from structured input data. This has resulted into substantial work on fact-to-text generation systems recently. Unfortunately, previous work on fact-to-text (F2T) generation has focused primarily on English mainly due to the high availability of relevant datasets. Only recently, the problem of cross-lingual fact-to-text (XF2T) was proposed for generation across multiple languages alongwith a dataset, XAlign for eight languages. However, there has been no rigorous work on the actual XF2T generation problem. We extend XAlign dataset with annotated data for four more languages: Punjabi, Malayalam, Assamese and Oriya. We conduct an extensive study using popular Transformer-based text generation models on our extended multi-lingual dataset, which we call XAlignV2. Further, we investigate the performance of different text generation strategies: multiple variations of pretraining, fact-aware embeddings and structure-aware input encoding. Our extensive experiments show that a multi-lingual mT5 model which uses fact-aware embeddings with structure-aware input encoding leads to best results (30.90 BLEU, 55.12 METEOR and 59.17 chrF++) across the twelve languages. We make our code, dataset and model publicly available, and hope that this will help advance further research in this critical area.
Anthology ID:
2023.inlg-main.2
Volume:
Proceedings of the 16th International Natural Language Generation Conference
Month:
September
Year:
2023
Address:
Prague, Czechia
Editors:
C. Maria Keet, Hung-Yi Lee, Sina Zarrieß
Venues:
INLG | SIGDIAL
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
15–27
Language:
URL:
https://aclanthology.org/2023.inlg-main.2
DOI:
10.18653/v1/2023.inlg-main.2
Bibkey:
Cite (ACL):
Shivprasad Sagare, Tushar Abhishek, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, and Vasudeva Varma. 2023. XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages. In Proceedings of the 16th International Natural Language Generation Conference, pages 15–27, Prague, Czechia. Association for Computational Linguistics.
Cite (Informal):
XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages (Sagare et al., INLG-SIGDIAL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.inlg-main.2.pdf