Subcategorization of Italian Verbs with LLMs and T-PAS

Luca Simonetti, Elisabetta Jezek, Guido Vetere


Abstract
This study explores the application of Large Language Models (LLMs) to verb subcategorization in Italian, focusing on the identification and classification of syntactic patterns in sentences. While LLMs have made lexical analysis more implicit, explicit argument structure identification remains crucial in domain-specific contexts. The research leverages T-PAS, a rich lexical resource for Italian verbs, to fine-tune the open multilingual model Mistral 7B using the Iterative Reasoning Preference Optimization (IRPO) technique. This approach aims to enhance the recognition and extraction of verbal patterns from Italian sentences, addressing challenges in resource quality, coverage, and frame extraction methods. By combining curated lexical-semantic resources with neural language models, this work contributes to improving verb subcategorization tasks, particularly for the Italian language, and demonstrates the potential of LLMs in refining linguistic analysis tools.
Anthology ID:
2024.clicit-1.99
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
921–928
Language:
URL:
https://aclanthology.org/2024.clicit-1.99/
DOI:
Bibkey:
Cite (ACL):
Luca Simonetti, Elisabetta Jezek, and Guido Vetere. 2024. Subcategorization of Italian Verbs with LLMs and T-PAS. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 921–928, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Subcategorization of Italian Verbs with LLMs and T-PAS (Simonetti et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.99.pdf