WebNLG-IT: Construction of an aligned RDF-Italian corpus through Machine Translation techniques

Michael Oliverio; Pier Felice Balestrucci; Alessandro Mazzei; Valerio Basile

doi:10.18653/v1/2025.findings-acl.625

WebNLG-IT: Construction of an aligned RDF-Italian corpus through Machine Translation techniques

Michael Oliverio, Pier Felice Balestrucci, Alessandro Mazzei, Valerio Basile

Abstract

The main goal of this work is the creation of the Italian version of the WebNLG corpus through the application of Neural Machine Translation (NMT) and post-editing with hand-written rules. To achieve this goal, in a first step, several existing NMT models were analysed and compared in order to identify the system with the highest performance on the original corpus. In a second step, after using the best NMT system, we semi-automatically designed and applied a number of rules to refine and improve the quality of the produced resource, creating a new corpus named WebNLG-IT. We used this resource for fine-tuning several LLMs for RDF-to-text tasks. In this way, comparing the performance of LLM-based generators on both Italian and English, we have (1) evaluated the quality of WebNLG-IT with respect to the original English version, (2) released the first fine-tuned LLM-based system for generating Italian from semantic web triples and (3) introduced an Italian version of a modular generation pipeline for RDF-to-text.

Anthology ID:: 2025.findings-acl.625
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12073–12083
Language:
URL:: https://aclanthology.org/2025.findings-acl.625/
DOI:: 10.18653/v1/2025.findings-acl.625
Bibkey:
Cite (ACL):: Michael Oliverio, Pier Felice Balestrucci, Alessandro Mazzei, and Valerio Basile. 2025. WebNLG-IT: Construction of an aligned RDF-Italian corpus through Machine Translation techniques. In Findings of the Association for Computational Linguistics: ACL 2025, pages 12073–12083, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: WebNLG-IT: Construction of an aligned RDF-Italian corpus through Machine Translation techniques (Oliverio et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.625.pdf

PDF Cite Search Fix data