Privacy-Preserving Models for Legal Natural Language Processing

Ying Yin, Ivan Habernal


Abstract
Pre-training large transformer models with in-domain data improves domain adaptation and helps gain performance on the domain-specific downstream tasks. However, sharing models pre-trained on potentially sensitive data is prone to adversarial privacy attacks. In this paper, we asked to which extent we can guarantee privacy of pre-training data and, at the same time, achieve better downstream performance on legal tasks without the need of additional labeled data. We extensively experiment with scalable self-supervised learning of transformer models under the formal paradigm of differential privacy and show that under specific training configurations we can improve downstream performance without sacrifying privacy protection for the in-domain data. Our main contribution is utilizing differential privacy for large-scale pre-training of transformer language models in the legal NLP domain, which, to the best of our knowledge, has not been addressed before.
Anthology ID:
2022.nllp-1.14
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
172–183
Language:
URL:
https://aclanthology.org/2022.nllp-1.14
DOI:
10.18653/v1/2022.nllp-1.14
Bibkey:
Cite (ACL):
Ying Yin and Ivan Habernal. 2022. Privacy-Preserving Models for Legal Natural Language Processing. In Proceedings of the Natural Legal Language Processing Workshop 2022, pages 172–183, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Privacy-Preserving Models for Legal Natural Language Processing (Yin & Habernal, NLLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nllp-1.14.pdf
Video:
 https://aclanthology.org/2022.nllp-1.14.mp4