Preston Thomas
2024
Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts
Shayna Gardiner
|
Tania Habib
|
Kevin Humphreys
|
Masha Azizi
|
Frederic Mailhot
|
Anne Paling
|
Preston Thomas
|
Nathan Zhang
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
Large language models in public-facing industrial applications must accurately process data for the domain in which they are deployed, but they must not leak sensitive or confidential information when used. We present a process for anonymizing training data, a framework for quantitatively and qualitatively assessing the effectiveness of this process, and an assessment of the effectiveness of models fine-tuned on anonymized data in comparison with commercially available LLM APIs.
Search
Co-authors
- Shayna Gardiner 1
- Tania Habib 1
- Kevin Humphreys 1
- Masha Azizi 1
- Frederic Mailhot 1
- show all...