Anonymise: A Tool for Multilingual Document Pseudonymisation

Rinalds Vīksna, Inguna Skadina


Abstract
According to the EU legislation, documents containing personal information need to be anonymized before public sharing. However, manual anonymisation is a time-consuming and costly process. Thus, there is a need for a robust text de-identification technique that accurately identifies and replaces personally identifiable information. This paper introduces the Anonymise tool, a system for document de-identification. The tool accepts text documents of various types (e.g., MS Word, plain-text), de-identifies personal information, and saves the de-identified document in its original format. The tool employs a modular architecture, integrating list-based matching, regular expressions and deep-learning-based named entity recognition to detect spans for redaction. Our evaluation results demonstrate high recall rates, making Anonymise a reliable solution for ensuring no sensitive information is left exposed. The tool can be accessed through a userfriendly web-based interface or API, offering flexibility for both individual and large-scale document processing needs. By automating document de-identification with high accuracy and efficiency, Anonymise presents a reliable solution for ensuring compliance with EU privacy regulations while reducing the time and cost associated with manual anonymisation.
Anthology ID:
2025.ranlp-1.154
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1327–1332
Language:
URL:
https://aclanthology.org/2025.ranlp-1.154/
DOI:
Bibkey:
Cite (ACL):
Rinalds Vīksna and Inguna Skadina. 2025. Anonymise: A Tool for Multilingual Document Pseudonymisation. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 1327–1332, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Anonymise: A Tool for Multilingual Document Pseudonymisation (Vīksna & Skadina, RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.154.pdf