Niko Dalla Noce


2026

Statutory article retrieval (SAR) targets retrieval of legislative provisions relevant to a natural language question. The lexical gap between everyday queries and specialized legal language, as well as the structural dependencies of statute law, makes it a challenging task. Here, we introduce JuriFindIT, the first SAR dataset for the Italian legal domain and the first to explicitly encode cross-article references extracted from national legal code. The dataset covers four macro-areas—civil law, criminal law, anti-money laundering and counter-terrorism, and privacy—and includes 895 expert-authored questions and 169,301 generated ones, linked to more than 23,000 statutory articles. We provide retrieval models fine-tuned on JuriFindIT, proposing a pipeline that integrates dense encoders with an heterogeneous legislative graph, achieving consistent improvements over prior SAR approaches.