Niko Dalla Noce
2026
JuriFindIT: an Italian legal retrieval dataset
Niko Dalla Noce | Davide Colla | Sina Farhang Doust | Lorenzo De Mattei | Davide Bacciu
Findings of the Association for Computational Linguistics: EACL 2026
Niko Dalla Noce | Davide Colla | Sina Farhang Doust | Lorenzo De Mattei | Davide Bacciu
Findings of the Association for Computational Linguistics: EACL 2026
Statutory article retrieval (SAR) targets retrieval of legislative provisions relevant to a natural language question. The lexical gap between everyday queries and specialized legal language, as well as the structural dependencies of statute law, makes it a challenging task. Here, we introduce JuriFindIT, the first SAR dataset for the Italian legal domain and the first to explicitly encode cross-article references extracted from national legal code. The dataset covers four macro-areas—civil law, criminal law, anti-money laundering and counter-terrorism, and privacy—and includes 895 expert-authored questions and 169,301 generated ones, linked to more than 23,000 statutory articles. We provide retrieval models fine-tuned on JuriFindIT, proposing a pipeline that integrates dense encoders with an heterogeneous legislative graph, achieving consistent improvements over prior SAR approaches.