Akshita Gheewala
2024
PSILENCE: A Pseudonymization Tool for International Law
Luis Adrián Cabrera-Diego
|
Akshita Gheewala
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
Since the announcement of the GDPR, the pseudonymization of legal documents has become a high-priority task in many legal organizations. This means that for making public a document, it is necessary to redact the identity of certain entities, such as witnesses. In this work, we present the first results obtained by PSILENCE, a pseudonymization tool created for redacting semi-automatically international arbitration documents in English. PSILENCE has been built using a Named Entity Recognition (NER) system, along with a Coreference Resolution system. These systems allow us to find the people that we need to redact in a clustered way, but also to propose the same pseudonym throughout one document. This last aspect makes it easier to read and comprehend a redacted legal document. Different experiments were done on four different datasets, one of which was legal, and the results are promising, reaching a Macro F-score of up to 0.72 on the legal dataset.
2023
Jus Mundi at SemEval-2023 Task 6: Using a Frustratingly Easy Domain Adaption for a Legal Named Entity Recognition System
Luis Adrián Cabrera-Diego
|
Akshita Gheewala
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
In this work, we present a Named Entity Recognition (NER) system that was trained using a Frustratingly Easy Domain Adaptation (FEDA) over multiple legal corpora. The goal was to create a NER capable of detecting 14 types of legal named entities in Indian judgments. Besides the FEDA architecture, we explored a method based on overlapping context and averaging tensors to process long input texts, which can be beneficial when processing legal documents. The proposed NER reached an F1-score of 0.9007 in the sub-task B of Semeval-2023 Task 6, Understanding Legal Texts.