Automated Anonymization of Parole Hearing Transcripts

Abed Itani, Wassiliki Siskou, Annette Hautli-Janisz


Abstract
Responsible natural language processing is more and more concerned with preventing the violation of personal rights that language technology can entail (CITATION). In this paper we illustrate the case of parole hearings in California, the verbatim transcripts of which are made available to the general public upon a request sent to the California Board of Parole Hearings. The parole hearing setting is highly sensitive: inmates face a board of legal representatives who discuss highly personal matters not only about the inmates themselves but also about victims and their relatives, such as spouses and children. Participants have no choice in contributing to the data collection process, since the disclosure of the transcripts is mandated by law. As researchers who are interested in understanding and modeling the communication in these hierarchy-driven settings, we face an ethical dilemma: publishing raw data as is for the community would compromise the privacy of all individuals affected, but manually cleaning the data requires a substantive effort. In this paper we present an automated anonymization process which reliably removes and pseudonymizes sensitive data in verbatim transcripts, while at the same time preserving the structure and content of the data. Our results show that the process exhibits little to no leakage of sensitive information when applied to more than 300 hearing transcripts.
Anthology ID:
2024.nllp-1.9
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2024
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
115–128
Language:
URL:
https://aclanthology.org/2024.nllp-1.9
DOI:
Bibkey:
Cite (ACL):
Abed Itani, Wassiliki Siskou, and Annette Hautli-Janisz. 2024. Automated Anonymization of Parole Hearing Transcripts. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 115–128, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
Automated Anonymization of Parole Hearing Transcripts (Itani et al., NLLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nllp-1.9.pdf