Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption

Garam Lee; Minsoo Kim; Jai Hyun Park; Seung-won Hwang; Jung Hee Cheon

doi:10.18653/v1/2022.naacl-main.231

Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption

Garam Lee, Minsoo Kim, Jai Hyun Park, Seung-won Hwang, Jung Hee Cheon

Abstract

Embeddings, which compress information in raw text into semantics-preserving low-dimensional vectors, have been widely adopted for their efficacy. However, recent research has shown that embeddings can potentially leak private information about sensitive attributes of the text, and in some cases, can be inverted to recover the original input text. To address these growing privacy challenges, we propose a privatization mechanism for embeddings based on homomorphic encryption, to prevent potential leakage of any piece of information in the process of text classification. In particular, our method performs text classification on the encryption of embeddings from state-of-the-art models like BERT, supported by an efficient GPU implementation of CKKS encryption scheme. We show that our method offers encrypted protection of BERT embeddings, while largely preserving their utility on downstream text classification tasks.

Anthology ID:: 2022.naacl-main.231
Volume:: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3169–3175
Language:
URL:: https://aclanthology.org/2022.naacl-main.231
DOI:: 10.18653/v1/2022.naacl-main.231
Bibkey:
Cite (ACL):: Garam Lee, Minsoo Kim, Jai Hyun Park, Seung-won Hwang, and Jung Hee Cheon. 2022. Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3169–3175, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption (Lee et al., NAACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.naacl-main.231.pdf
Software:: 2022.naacl-main.231.software.zip
Video:: https://aclanthology.org/2022.naacl-main.231.mp4

PDF Cite Search Software Video