XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages

Shivprasad Sagare; Tushar Abhishek; Bhavyajeet Singh; Anubhav Sharma; Manish Gupta; Vasudeva Varma

doi:10.18653/v1/2023.inlg-main.2

XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages

Shivprasad Sagare, Tushar Abhishek, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, Vasudeva Varma

Abstract

Multiple business scenarios require an automated generation of descriptive human-readable text from structured input data. This has resulted into substantial work on fact-to-text generation systems recently. Unfortunately, previous work on fact-to-text (F2T) generation has focused primarily on English mainly due to the high availability of relevant datasets. Only recently, the problem of cross-lingual fact-to-text (XF2T) was proposed for generation across multiple languages alongwith a dataset, XAlign for eight languages. However, there has been no rigorous work on the actual XF2T generation problem. We extend XAlign dataset with annotated data for four more languages: Punjabi, Malayalam, Assamese and Oriya. We conduct an extensive study using popular Transformer-based text generation models on our extended multi-lingual dataset, which we call XAlignV2. Further, we investigate the performance of different text generation strategies: multiple variations of pretraining, fact-aware embeddings and structure-aware input encoding. Our extensive experiments show that a multi-lingual mT5 model which uses fact-aware embeddings with structure-aware input encoding leads to best results (30.90 BLEU, 55.12 METEOR and 59.17 chrF++) across the twelve languages. We make our code, dataset and model publicly available, and hope that this will help advance further research in this critical area.

Anthology ID:: 2023.inlg-main.2
Volume:: Proceedings of the 16th International Natural Language Generation Conference
Month:: September
Year:: 2023
Address:: Prague, Czechia
Editors:: C. Maria Keet, Hung-Yi Lee, Sina Zarrieß
Venues:: INLG | SIGDIAL
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15–27
Language:
URL:: https://aclanthology.org/2023.inlg-main.2/
DOI:: 10.18653/v1/2023.inlg-main.2
Bibkey:
Cite (ACL):: Shivprasad Sagare, Tushar Abhishek, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, and Vasudeva Varma. 2023. XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages. In Proceedings of the 16th International Natural Language Generation Conference, pages 15–27, Prague, Czechia. Association for Computational Linguistics.
Cite (Informal):: XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages (Sagare et al., INLG-SIGDIAL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.inlg-main.2.pdf

PDF Cite Search Fix data