DipInfo-UniTo at the GEM’24 Data-to-Text Task: Augmenting LLMs with the Split-Generate-Aggregate Pipeline

Michael Oliverio, Pier Felice Balestrucci, Alessandro Mazzei, Valerio Basile


Abstract
This paper describes the DipInfo-UniTo system participating to the GEM shared task 2024. We participate only to the Data-to-Text (D2T) task. The DipInfo-UniTo system is based on Mistral (Jiang et al., 2023), a recent Large Language Model (LLM). Most LLMs are capable of generating high-quality text for D2T tasks but, crucially, they often fall short in terms of adequacy, and sometimes exhibit “hallucinations”. To mitigate this issue, we have implemented a generation pipeline that combines LLMs with techniques from the traditional Natural Language Generation (NLG) pipeline. In particular, we have a three step process SGA, consisting in (1) Splitting the original set of triples, (2) Generating verbalizations from the resulting split data units, (3) Aggregating the verbalizations produced in the previous step.
Anthology ID:
2024.inlg-genchal.6
Volume:
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges
Month:
September
Year:
2024
Address:
Tokyo, Japan
Editors:
Simon Mille, Miruna-Adriana Clinciu
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
59–65
Language:
URL:
https://aclanthology.org/2024.inlg-genchal.6
DOI:
Bibkey:
Cite (ACL):
Michael Oliverio, Pier Felice Balestrucci, Alessandro Mazzei, and Valerio Basile. 2024. DipInfo-UniTo at the GEM’24 Data-to-Text Task: Augmenting LLMs with the Split-Generate-Aggregate Pipeline. In Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges, pages 59–65, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
DipInfo-UniTo at the GEM’24 Data-to-Text Task: Augmenting LLMs with the Split-Generate-Aggregate Pipeline (Oliverio et al., INLG 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.inlg-genchal.6.pdf