TeamSaarLST at the GEM’24 Data-to-text Task: Revisiting symbolic retrieval in the LLM-age

Mayank Jobanputra; Vera Demberg

TeamSaarLST at the GEM’24 Data-to-text Task: Revisiting symbolic retrieval in the LLM-age

Abstract

Data-to-text (D2T) generation is a natural language generation (NLG) task in which a system describes structured data in natural language. Generating natural language verbalization for structured data is challenging as the data may not contain all the required details (here, properties such as gender are missing from the input data and need to be inferred for correct language generation), and because the structured data may conflict with the knowledge contained in the LLM’s parameters learned during pre-training. Both of these factors (incorrect filling in of details, pretraining conflict and input data) can lead to so-called hallucinations. In this paper, we propose a few-shot retrieval augmented generation (RAG) system, using a symbolic retriever – PropertyRetriever. Additionally, we experiment with state-of-the-art large language models (LLMs) to generate data verbalizations. Our system achieves the best results on 4 out of 6 subtasks for METEOR and chrF++ metrics. We present our results along with an error analysis. We release our code for reproducing the results as well as the generated verbalizations from all the experiments for any further explorations here.

Anthology ID:: 2024.inlg-genchal.10
Volume:: Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges
Month:: September
Year:: 2024
Address:: Tokyo, Japan
Editors:: Simon Mille, Miruna-Adriana Clinciu
Venue:: INLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 92–99
Language:
URL:: https://aclanthology.org/2024.inlg-genchal.10
DOI:
Bibkey:
Cite (ACL):: Mayank Jobanputra and Vera Demberg. 2024. TeamSaarLST at the GEM’24 Data-to-text Task: Revisiting symbolic retrieval in the LLM-age. In Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges, pages 92–99, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):: TeamSaarLST at the GEM’24 Data-to-text Task: Revisiting symbolic retrieval in the LLM-age (Jobanputra & Demberg, INLG 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.inlg-genchal.10.pdf

PDF Cite Search