Personal Information Leakage Detection in Conversations

Qiongkai Xu, Lizhen Qu, Zeyu Gao, Gholamreza Haffari


Abstract
The global market size of conversational assistants (chatbots) is expected to grow to USD 9.4 billion by 2024, according to MarketsandMarkets. Despite the wide use of chatbots, leakage of personal information through chatbots poses serious privacy concerns for their users. In this work, we propose to protect personal information by warning users of detected suspicious sentences generated by conversational assistants. The detection task is formulated as an alignment optimization problem and a new dataset PERSONA-LEAKAGE is collected for evaluation. In this paper, we propose two novel constrained alignment models, which consistently outperform baseline methods on Moreover, we conduct analysis on the behavior of recently proposed personalized chit-chat dialogue systems. The empirical results show that those systems suffer more from personal information disclosure than the widely used Seq2Seq model and the language model. In those cases, a significant number of information leaking utterances can be detected by our models with high precision.
Anthology ID:
2020.emnlp-main.532
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6567–6580
Language:
URL:
https://aclanthology.org/2020.emnlp-main.532
DOI:
10.18653/v1/2020.emnlp-main.532
Bibkey:
Cite (ACL):
Qiongkai Xu, Lizhen Qu, Zeyu Gao, and Gholamreza Haffari. 2020. Personal Information Leakage Detection in Conversations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6567–6580, Online. Association for Computational Linguistics.
Cite (Informal):
Personal Information Leakage Detection in Conversations (Xu et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.532.pdf
Video:
 https://slideslive.com/38939030
Code
 xuqiongkai/PILD