Preston Thomas

2024

Large language models in public-facing industrial applications must accurately process data for the domain in which they are deployed, but they must not leak sensitive or confidential information when used. We present a process for anonymizing training data, a framework for quantitatively and qualitatively assessing the effectiveness of this process, and an assessment of the effectiveness of models fine-tuned on anonymized data in comparison with commercially available LLM APIs.

Co-authors

Masha Azizi 1
Shayna Gardiner 1
Tania Habib 1
Kevin Humphreys 1
Frederic Mailhot 1

Anne Paling 1

Nathan Zhang 1

Venues

CALD-pseudo1
WS1

Fix author