Join Together? Combining Data to Parse Italian Texts

Claudia Corbetta, Giovanni Moretti, Marco Passarotti


Abstract
In this paper, we create and evaluate non-combined and combined models using Old and Contemporary Italian data to determine whether increasing the size of the training data with a combined model could improve parsing accuracy to facilitate manual annotation. We find that, despite the increased size of the training data, in-domain parsing performs better. Additionally, we discover that models trained on Old Italian data perform better on Contemporary Italian data than the reverse. We attempt to explain this result in terms of syntactic complexity, finding that Old Italian text exhibits higher sentence length and non-projectivity rate.
Anthology ID:
2024.clicit-1.30
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
251–257
Language:
URL:
https://aclanthology.org/2024.clicit-1.30/
DOI:
Bibkey:
Cite (ACL):
Claudia Corbetta, Giovanni Moretti, and Marco Passarotti. 2024. Join Together? Combining Data to Parse Italian Texts. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 251–257, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Join Together? Combining Data to Parse Italian Texts (Corbetta et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.30.pdf