Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity

Li Tang, Simon Clematide


Abstract
Searching for legal documents is a specialized Information Retrieval task that is relevant for expert users (lawyers and their assistants) and for non-expert users. By searching previous court decisions (cases), a user can better prepare the legal reasoning of a new case. Being able to search using a natural language text snippet instead of a more artificial query could help to prevent query formulation issues. Also, if semantic similarity could be modeled beyond exact lexical matches, more relevant results can be found even if the query terms don’t match exactly. For this domain, we formulated a task to compare different ways of modeling semantic similarity at paragraph level, using neural and non-neural systems. We compared systems that encode the query and the search collection paragraphs as vectors, enabling the use of cosine similarity for results ranking. After building a German dataset for cases and statutes from Switzerland, and extracting citations from cases to statutes, we developed an algorithm for estimating semantic similarity at paragraph level, using a link-based similarity method. When evaluating different systems in this way, we find that semantic similarity modeling by neural systems can be boosted with an extended attention mask that quenches noise in the inputs.
Anthology ID:
2021.nllp-1.12
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Nikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, Daniel Preotiuc-Pietro
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
114–122
Language:
URL:
https://aclanthology.org/2021.nllp-1.12
DOI:
10.18653/v1/2021.nllp-1.12
Bibkey:
Cite (ACL):
Li Tang and Simon Clematide. 2021. Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity. In Proceedings of the Natural Legal Language Processing Workshop 2021, pages 114–122, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity (Tang & Clematide, NLLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nllp-1.12.pdf