Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts

Shayna Gardiner; Tania Habib; Kevin Humphreys; Masha Azizi; Frederic Mailhot; Anne Paling; Preston Thomas; Nathan Zhang

Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts

Shayna Gardiner, Tania Habib, Kevin Humphreys, Masha Azizi, Frederic Mailhot, Anne Paling, Preston Thomas, Nathan Zhang

Abstract

Large language models in public-facing industrial applications must accurately process data for the domain in which they are deployed, but they must not leak sensitive or confidential information when used. We present a process for anonymizing training data, a framework for quantitatively and qualitatively assessing the effectiveness of this process, and an assessment of the effectiveness of models fine-tuned on anonymized data in comparison with commercially available LLM APIs.

Anthology ID:: 2024.caldpseudo-1.8
Volume:: Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
Month:: March
Year:: 2024
Address:: St. Julian’s, Malta
Editors:: Elena Volodina, David Alfter, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Maria Irena Szawerna, Xuan-Son Vu
Venues:: CALD-pseudo | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 64–75
Language:
URL:: https://aclanthology.org/2024.caldpseudo-1.8
DOI:
Bibkey:
Cite (ACL):: Shayna Gardiner, Tania Habib, Kevin Humphreys, Masha Azizi, Frederic Mailhot, Anne Paling, Preston Thomas, and Nathan Zhang. 2024. Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts. In Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024), pages 64–75, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):: Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts (Gardiner et al., CALD-pseudo-WS 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.caldpseudo-1.8.pdf

PDF Cite Search