Qiongkai Xu


pdf bib
Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!
Xuanli He | Lingjuan Lyu | Lichao Sun | Qiongkai Xu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by the pretrained language models, such as BERT. This allows corporations to easily build powerful APIs by encapsulating fine-tuned BERT models for downstream tasks. However, when a fine-tuned BERT model is deployed as a service, it may suffer from different attacks launched by the malicious users. In this work, we first present how an adversary can steal a BERT-based API service (the victim/target model) on multiple benchmark datasets with limited prior knowledge and queries. We further show that the extracted model can lead to highly transferable adversarial attacks against the victim model. Our studies indicate that the potential vulnerabilities of BERT-based API services still hold, even when there is an architectural mismatch between the victim model and the attack model. Finally, we investigate two defence strategies to protect the victim model, and find that unless the performance of the victim model is sacrificed, both model extraction and adversarial transferability can effectively compromise the target models.


pdf bib
Personal Information Leakage Detection in Conversations
Qiongkai Xu | Lizhen Qu | Zeyu Gao | Gholamreza Haffari
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The global market size of conversational assistants (chatbots) is expected to grow to USD 9.4 billion by 2024, according to MarketsandMarkets. Despite the wide use of chatbots, leakage of personal information through chatbots poses serious privacy concerns for their users. In this work, we propose to protect personal information by warning users of detected suspicious sentences generated by conversational assistants. The detection task is formulated as an alignment optimization problem and a new dataset PERSONA-LEAKAGE is collected for evaluation. In this paper, we propose two novel constrained alignment models, which consistently outperform baseline methods on Moreover, we conduct analysis on the behavior of recently proposed personalized chit-chat dialogue systems. The empirical results show that those systems suffer more from personal information disclosure than the widely used Seq2Seq model and the language model. In those cases, a significant number of information leaking utterances can be detected by our models with high precision.


pdf bib
Privacy-Aware Text Rewriting
Qiongkai Xu | Lizhen Qu | Chenchen Xu | Ran Cui
Proceedings of the 12th International Conference on Natural Language Generation

Biased decisions made by automatic systems have led to growing concerns in research communities. Recent work from the NLP community focuses on building systems that make fair decisions based on text. Instead of relying on unknown decision systems or human decision-makers, we argue that a better way to protect data providers is to remove the trails of sensitive information before publishing the data. In light of this, we propose a new privacy-aware text rewriting task and explore two privacy-aware back-translation methods for the task, based on adversarial training and approximate fairness risk. Our extensive experiments on three real-world datasets with varying demographical attributes show that our methods are effective in obfuscating sensitive attributes. We have also observed that the fairness risk method retains better semantics and fluency, while the adversarial training method tends to leak less sensitive information.

pdf bib
ALTER: Auxiliary Text Rewriting Tool for Natural Language Generation
Qiongkai Xu | Chenchen Xu | Lizhen Qu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

In this paper, we describe ALTER, an auxiliary text rewriting tool that facilitates the rewriting process for natural language generation tasks, such as paraphrasing, text simplification, fairness-aware text rewriting, and text style transfer. Our tool is characterized by two features, i) recording of word-level revision histories and ii) flexible auxiliary edit support and feedback to annotators. The text rewriting assist and traceable rewriting history are potentially beneficial to the future research of natural language generation.


pdf bib
EPUTION at SemEval-2018 Task 2: Emoji Prediction with User Adaption
Liyuan Zhou | Qiongkai Xu | Hanna Suominen | Tom Gedeon
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes our approach, called EPUTION, for the open trial of the SemEval- 2018 Task 2, Multilingual Emoji Prediction. The task relates to using social media — more precisely, Twitter — with its aim to predict the most likely associated emoji of a tweet. Our solution for this text classification problem explores the idea of transfer learning for adapting the classifier based on users’ tweeting history. Our experiments show that our user-adaption method improves classification results by more than 6 per cent on the macro-averaged F1. Thus, our paper provides evidence for the rationality of enriching the original corpus longitudinally with user behaviors and transferring the lessons learned from corresponding users to specific instances.


pdf bib
Demographic Inference on Twitter using Recursive Neural Networks
Sunghwan Mac Kim | Qiongkai Xu | Lizhen Qu | Stephen Wan | Cécile Paris
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In social media, demographic inference is a critical task in order to gain a better understanding of a cohort and to facilitate interacting with one’s audience. Most previous work has made independence assumptions over topological, textual and label information on social networks. In this work, we employ recursive neural networks to break down these independence assumptions to obtain inference about demographic characteristics on Twitter. We show that our model performs better than existing models including the state-of-the-art.


pdf bib
Unsupervised Pre-training With Seq2Seq Reconstruction Loss for Deep Relation Extraction Models
Zhuang Li | Lizhen Qu | Qiongkai Xu | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2016


pdf bib
Detecting English-French Cognates Using Orthographic Edit Distance
Qiongkai Xu | Albert Chen | Chang Li
Proceedings of the Australasian Language Technology Association Workshop 2015


pdf bib
Using Deep Linguistic Features for Finding Deceptive Opinion Spam
Qiongkai Xu | Hai Zhao
Proceedings of COLING 2012: Posters