Michael Oliverio
2024
DipInfo-UniTo at the GEM’24 Data-to-Text Task: Augmenting LLMs with the Split-Generate-Aggregate Pipeline
Michael Oliverio
|
Pier Felice Balestrucci
|
Alessandro Mazzei
|
Valerio Basile
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges
This paper describes the DipInfo-UniTo system participating to the GEM shared task 2024. We participate only to the Data-to-Text (D2T) task. The DipInfo-UniTo system is based on Mistral (Jiang et al., 2023), a recent Large Language Model (LLM). Most LLMs are capable of generating high-quality text for D2T tasks but, crucially, they often fall short in terms of adequacy, and sometimes exhibit “hallucinations”. To mitigate this issue, we have implemented a generation pipeline that combines LLMs with techniques from the traditional Natural Language Generation (NLG) pipeline. In particular, we have a three step process SGA, consisting in (1) Splitting the original set of triples, (2) Generating verbalizations from the resulting split data units, (3) Aggregating the verbalizations produced in the previous step.