Garam Lee
2022
Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption
Garam Lee
|
Minsoo Kim
|
Jai Hyun Park
|
Seung-won Hwang
|
Jung Hee Cheon
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Embeddings, which compress information in raw text into semantics-preserving low-dimensional vectors, have been widely adopted for their efficacy. However, recent research has shown that embeddings can potentially leak private information about sensitive attributes of the text, and in some cases, can be inverted to recover the original input text. To address these growing privacy challenges, we propose a privatization mechanism for embeddings based on homomorphic encryption, to prevent potential leakage of any piece of information in the process of text classification. In particular, our method performs text classification on the encryption of embeddings from state-of-the-art models like BERT, supported by an efficient GPU implementation of CKKS encryption scheme. We show that our method offers encrypted protection of BERT embeddings, while largely preserving their utility on downstream text classification tasks.
Toward Privacy-preserving Text Embedding Similarity with Homomorphic Encryption
Donggyu Kim
|
Garam Lee
|
Sungwoo Oh
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)
Text embedding is an essential component to build efficient natural language applications based on text similarities such as search engines and chatbots. Certain industries like finance and healthcare demand strict privacy-preserving conditions that user’s data should not be exposed to any potential malicious users even including service providers. From a privacy standpoint, text embeddings seem impossible to be interpreted but there is still a privacy risk that they can be recovered to original texts through inversion attacks. To satisfy such privacy requirements, in this paper, we study a Homomorphic Encryption (HE) based text similarity inference. To validate our method, we perform extensive experiments on two vital text similarity tasks. Through text embedding inversion tests, we prove that the benchmark datasets are vulnerable to inversion attacks and another privacy preserving approach, dχ-privacy, a relaxed version of Local Differential Privacy method fails to prevent them. We show that our approach preserves the performance of models compared to that the baseline has degradation up to 10% of scores for the minimum security.
Search
Co-authors
- Minsoo Kim 1
- Jai Hyun Park 1
- Seung-won Hwang 1
- Jung Hee Cheon 1
- Donggyu Kim 1
- show all...