Wu Ye

2024

With the rapidly-growing deployment of large language model (LLM) inference services, privacy concerns have arisen regarding to the user input data. Recent studies are exploring transforming user inputs to obfuscated embedded vectors, so that the data will not be eavesdropped by service provides. However, in this paper we show that again, without a solid and deliberate security design and analysis, such embedded vector obfuscation failed to protect users’ privacy. We demonstrate the conclusion via conducting a novel inversion attack called Element-wise Differential Nearest Neighbor (EDNN) on the glide-reflection proposed in (CITATION), and the result showed that the original user input text can be 100% recovered from the obfuscated embedded vectors. We further analyze security requirements on embedding obfuscation and present several remedies to our proposed attack.

2020

pdf bib abs
A Multitask Active Learning Framework for Natural Language Understanding
Hua Zhu | Wu Ye | Sihan Luo | Xidong Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Natural language understanding (NLU) aims at identifying user intent and extracting semantic slots. This requires sufficient annotating data to get considerable performance in real-world situations. Active learning (AL) has been well-studied to decrease the needed amount of the annotating data and successfully applied to NLU. However, no research has been done on investigating how the relation information between intents and slots can improve the efficiency of AL algorithms. In this paper, we propose a multitask AL framework for NLU. Our framework enables pool-based AL algorithms to make use of the relation information between sub-tasks provided by a joint model, and we propose an efficient computation for the entropy of a joint model. Experimental results show our framework can achieve competitive performance with less training data than baseline methods on all datasets. We also demonstrate that when using the entropy as the query strategy, the model with complete relation information can perform better than those with partial information. Additionally, we demonstrate that the efficiency of these active learning algorithms in our framework is still effective when incorporate with the Bidirectional Encoder Representations from Transformers (BERT).

Co-authors

Sihan Luo 1

Xidong Zhang 1

Qizhi Zhang 1

Hua Zhu 1

Venues

coling1
emnlp1

Fix data