DCU-NLG-PBN at the GEM’24 Data-to-Text Task: Open-Source LLM PEFT-Tuning for Effective Data-to-Text Generation

Michela Lorandi, Anya Belz


Abstract
LLMs have been used in various tasks with impressive success, including data-to-text generation. However, one concern when LLMs are compared to alternative methods is data contamination, in other words, for many datasets the data used in training these models may have included publicly available test sets. In this paper, we explore the performance of LLMs using newly constructed datasets in the context of data-to-text generation for English, Chinese, German, Russian, Spanish, Korean, Hindi, Swahili, and Arabic. We performed a testing phase to evaluate a range of prompt types and a fine-tuning technique on Mistral 7B and Falcon 40B. We then fully evaluated the most promising system for each scenario: (i) LLM prompting in English followed by translation, and (ii) LLM PEFT-tuning in English followed by translation. We find that fine-tuning Mistral outperforms all other tested systems and achieves performance close to GPT-3.5. The few-shot prompting with a dynamic selection of examples achieves higher results among prompting. The human evaluation to be carried out by the shared-task organisers will provide insight into the performance of the new datasets. In conclusion, we observed how the fine-tuning of an open-source LLM can achieve good performance close to state-of-the-art closed-source LLM while using considerably fewer resources.
Anthology ID:
2024.inlg-genchal.8
Volume:
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges
Month:
September
Year:
2024
Address:
Tokyo, Japan
Editors:
Simon Mille, Miruna-Adriana Clinciu
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
76–83
Language:
URL:
https://aclanthology.org/2024.inlg-genchal.8
DOI:
Bibkey:
Cite (ACL):
Michela Lorandi and Anya Belz. 2024. DCU-NLG-PBN at the GEM’24 Data-to-Text Task: Open-Source LLM PEFT-Tuning for Effective Data-to-Text Generation. In Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges, pages 76–83, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
DCU-NLG-PBN at the GEM’24 Data-to-Text Task: Open-Source LLM PEFT-Tuning for Effective Data-to-Text Generation (Lorandi & Belz, INLG 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.inlg-genchal.8.pdf