Xuanli He - ACL Anthology

Xuanli He

2025

Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench—a human-translated benchmark dataset for 17 typologically-diverse low-resource African languages covering three tasks: natural language inference(AfriXNLI), mathematical reasoning(AfriMGSM), and multi-choice knowledge-based QA(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings(where test sets are translated into English) across 10 open and four proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages (such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Gemma 2 27B only at 63% of the best-performing proprietary model GPT-4o performance. Machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, like Gemma 2 27B and LLaMa 3.1 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages.

Self-Training Large Language Models for Tool-Use Without Demonstrations
Ne Luo | Aryo Pradipta Gema | Xuanli He | Emile Van Krieken | Pietro Lesci | Pasquale Minervini
Findings of the Association for Computational Linguistics: NAACL 2025

Large language models (LLMs) remain prone to factual inaccuracies and computational errors, including hallucinations and mistakes in mathematical reasoning. Recent work augmented LLMs with tools to mitigate these shortcomings, but often requires curated gold tool-use demonstrations. In this paper, we investigate whether LLMs can learn to use tools without demonstrations. First, we analyse zero-shot prompting strategies to guide LLMs in tool utilisation. Second, we propose a self-training method to synthesise tool-use traces using the LLM itself. We compare supervised fine-tuning and preference fine-tuning techniques for fine-tuning the model on datasets constructed using existing Question Answering (QA) datasets, i.e., TriviaQA and GSM8K. Experiments show that tool-use enhances performance on a long-tail knowledge task: 3.7% on PopQA, which is used solely for evaluation, but leads to mixed results on other datasets, i.e., TriviaQA, GSM8K, and NQ-Open. Our findings highlight the potential and challenges of integrating external tools into LLMs without demonstrations.

Maybe not. We identify and analyse errors in the popular Massive Multitask Language Understanding (MMLU) benchmark. Even though MMLU is widely adopted, our analysis demonstrates numerous ground truth errors that obscure the true capabilities of LLMs. For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive framework for identifying dataset errors using a novel error annotation protocol. Then, we create MMLU-Redux, which is a subset of 5,700 manually re-annotated questions across all 57 MMLU subjects. Using MMLU-Redux, we demonstrate significant discrepancies with the model performance metrics that were originally reported. Our results strongly advocate for revising MMLU’s error-ridden questions to enhance its future utility and reliability as a benchmark. Therefore, we open up MMLU-Redux for additional annotation.

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Yu Zhao | Alessio Devoto | Giwon Hong | Xiaotang Du | Aryo Pradipta Gema | Hongru Wang | Xuanli He | Kam-Fai Wong | Pasquale Minervini
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large language models (LLMs) can store a significant amount of factual knowledge in their parameters. However, their parametric knowledge may conflict with the information provided in the context—this phenomenon, known as context-memory knowledge conflicts, can lead to undesirable model behaviour, such as reliance on outdated or incorrect information. Analysing the internal activations of LLMs, we find that they can internally register the signals of knowledge conflict at mid-layers. Such signals allow us to detect whether a knowledge conflict occurs and use inference-time intervention strategies to resolve it. In this work, we propose SpARE, a training-free representation engineering method that uses pre-trained sparse auto-encoders (SAEs) to control the knowledge selection behaviour of LLMs. SpARE identifies the functional features that control the knowledge selection behaviours and applies them to edit the internal activations of LLMs at inference time. Our experimental results show that SpARE can effectively control the usage of either knowledge source to resolve knowledge conflict in open-domain question-answering tasks, surpassing existing representation engineering methods (+10%) as well as contrastive decoding methods (+15%).

Cut the Deadwood Out: Backdoor Purification via Guided Module Substitution
Yao Tong | Weijun Li | Xuanli He | Haolan Zhan | Qiongkai Xu
Findings of the Association for Computational Linguistics: EMNLP 2025

Model NLP models are commonly trained (or fine-tuned) on datasets from untrusted platforms like HuggingFace, posing significant risks of data poisoning attacks. A practical yet underexplored challenge arises when such backdoors are discovered after model deployment, making retraining-required defenses less desirable due to computational costs and data constraints. In this work, we propose Guided Module Substitution (GMS), an effective retraining-free method based on guided merging of the victim model with a single proxy model. Specifically, GMS selectively replaces modules in the victim model based on a trade-off signal between utility and backdoor. GMS offers four desirable properties: (1) robustness to the choice and trustworthiness of the proxy model, (2) applicability under relaxed data assumptions, (3) stability across hyperparameters, and (4) transferability across different attacks. Extensive experiments on encoder models and decoder LLMs demonstrate the strong effectiveness of GMS. GMS significantly outperforms even the strongest defense baseline, particularly against challenging attacks like LWS.

GRADA: Graph-based Reranking against Adversarial Documents Attack
Jingjie Zheng | Aryo Pradipta Gema | Giwon Hong | Xuanli He | Pasquale Minervini | Youcheng Sun | Qiongkai Xu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Retrieval Augmented Generation (RAG) frameworks can improve the factual accuracy of large language models (LLMs) by integrating external knowledge from retrieved documents, thereby overcoming the limitations of models’ static intrinsic knowledge. However, these systems are susceptible to adversarial attacks that manipulate the retrieval process by introducing documents that are adversarial yet semantically similar to the query. Notably, while these adversarial documents resemble the query, they exhibit weak similarity to benign documents in the retrieval set. Thus, we propose a simple yet effective **G**raph-based **R**eranking against **A**dversarial **D**ocument **A**ttacks (GRADA) framework aiming at preserving retrieval quality while significantly reducing the success of adversaries. Our study evaluates the effectiveness of our approach through experiments conducted on six LLMs: GPT-3.5-Turbo, GPT-4o, Llama3.1-8b-Instruct, Llama3.1-70b-Instruct, Qwen2.5-7b-Instruct and Qwen2.5-14b-Instruct. We use three datasets to assess performance, with results from the Natural Questions dataset demonstrating up to an 80% reduction in attack success rates while maintaining minimal loss in accuracy.

TUBA: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning
Xuanli He | Jun Wang | Qiongkai Xu | Pasquale Minervini | Pontus Stenetorp | Benjamin I. P. Rubinstein | Trevor Cohn
Findings of the Association for Computational Linguistics: ACL 2025

The implications of backdoor attacks on English-centric large language models (LLMs) have been widely examined — such attacks can be achieved by embedding malicious behaviors during training and activated under specific conditions that trigger malicious outputs. Despite the increasing support for multilingual capabilities in open-source and proprietary LLMs, the impact of backdoor attacks on these systems remains largely under-explored. Our research focuses on cross-lingual backdoor attacks against multilingual LLMs, particularly investigating how poisoning the instructiontuning data for one or two languages can affect the outputs for languages whose instructiontuning data were not poisoned. Despite its simplicity, our empirical analysis reveals that our method exhibits remarkable efficacy in models like BLOOM and GPT-4o, with high attack success rates, surpassing 90% in more than 7 out of 12 languages across various scenarios. Our findings also indicate that more powerful models show increased susceptibility to transferable cross-lingual backdoor attacks, which also applies to LLMs predominantly pre-trained on English/Chinese data, such as Llama2, Llama3, Qwen2.5, and Gemma. Moreover, our experiments demonstrate 1) High Transferability: the backdoor mechanism operates successfully in cross lingual response scenarios across 26 languages, achieving an average attack success rate of 99%, and 2) Robustness: the proposed attack remains effective even after defenses are applied. These findings expose critical security vulnerabilities in multilingual LLMs and highlight the urgent need for more robust, targeted defense strategies to address the unique challenges posed by cross-lingual backdoor transfer.

2024

AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
Jiayi Wang | David Ifeoluwa Adelani | Sweta Agrawal | Marek Masiak | Ricardo Rei | Eleftheria Briakou | Marine Carpuat | Xuanli He | Sofia Bourhim | Andiswa Bukula | Muhidin Mohamed | Temitayo Olatoye | Tosin Adewumi | Hamam Mokayed | Christine Mwase | Wangui Kimotho | Foutse Yuehgoh | Anuoluwapo Aremu | Jessica Ojo | Shamsuddeen Hassan Muhammad | Salomey Osei | Abdul-Hakeem Omotayo | Chiamaka Chukwuneke | Perez Ogayo | Oumaima Hourrane | Salma El Anigri | Lolwethu Ndolela | Thabiso Mangwana | Shafie Abdi Mohamed | Ayinde Hassan | Oluwabusayo Olufunke Awoyomi | Lama Alkhaled | Sana Al-Azzawi | Naome A. Etori | Millicent Ochieng | Clemencia Siro | Samuel Njoroge | Eric Muchiri | Wangari Kimotho | Lyse Naomi Wamba Momo | Daud Abolade | Simbiat Ajao | Iyanuoluwa Shode | Ricky Macharm | Ruqayya Nasir Iro | Saheed S. Abdullahi | Stephen E. Moore | Bernard Opoku | Zainab Akinjobi | Abeeb Afolabi | Nnaemeka Obiefuna | Onyekachi Raphael Ogbu | Sam Brian | Verrah Akinyi Otiende | Chinedu Emmanuel Mbonu | Sakayo Toadoum Sari | Yao Lu | Pontus Stenetorp
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).

Backdoor Attacks on Multilingual Machine Translation
Jun Wang | Qiongkai Xu | Xuanli He | Benjamin Rubinstein | Trevor Cohn
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

While multilingual machine translation (MNMT) systems hold substantial promise, they also have security vulnerabilities. Our research highlights that MNMT systems can be susceptible to a particularly devious style of backdoor attack, whereby an attacker injects poisoned data into a low-resource language pair to cause malicious translations in other languages, including high-resource languages.Our experimental results reveal that injecting less than 0.01% poisoned data into a low-resource language pair can achieve an average 20% attack success rate in attacking high-resource language pairs. This type of attack is of particular concern, given the larger attack surface of languages inherent to low-resource settings. Our aim is to bring attention to these vulnerabilities within MNMT systems with the hope of encouraging the community to address security concerns in machine translation, especially in the context of low-resource languages.

SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks
Xuanli He | Qiongkai Xu | Jun Wang | Benjamin I. P. Rubinstein | Trevor Cohn
Transactions of the Association for Computational Linguistics, Volume 12

Modern NLP models are often trained on public datasets drawn from diverse sources, rendering them vulnerable to data poisoning attacks. These attacks can manipulate the model’s behavior in ways engineered by the attacker. One such tactic involves the implantation of backdoors, achieved by poisoning specific training instances with a textual trigger and a target class label. Several strategies have been proposed to mitigate the risks associated with backdoor attacks by identifying and removing suspected poisoned examples. However, we observe that these strategies fail to offer effective protection against several advanced backdoor attacks. To remedy this deficiency, we propose a novel defensive mechanism that first exploits training dynamics to identify poisoned samples with high precision, followed by a label propagation step to improve recall and thus remove the majority of poisoned instances. Compared with recent advanced defense methods, our method considerably reduces the success rates of several backdoor attacks while maintaining high classification accuracy on clean test sets.

Here’s a Free Lunch: Sanitizing Backdoored Models with Model Merge
Ansh Arora | Xuanli He | Maximilian Mozes | Srinibas Swain | Mark Dras | Qiongkai Xu
Findings of the Association for Computational Linguistics: ACL 2024

The democratization of pre-trained language models through open-source initiatives has rapidly advanced innovation and expanded access to cutting-edge technologies. However, this openness also brings significant security risks, including backdoor attacks, where hidden malicious behaviors are triggered by specific inputs, compromising natural language processing (NLP) system integrity and reliability. This paper suggests that merging a backdoored model with other homogeneous models can significantly remediate backdoor vulnerabilities even if such models are not entirely secure. In our experiments, we verify our hypothesis on various models (BERT-Base, RoBERTa-Large, Llama2-7B, and Mistral-7B) and datasets (SST-2, OLID, AG News, and QNLI). Compared to multiple advanced defensive approaches, our method offers an effective and efficient inference-stage defense against backdoor attacks on classification and instruction-tuned tasks without additional resources or specific knowledge. Our approach consistently outperforms recent advanced baselines, leading to an average of about 75% reduction in the attack success rate. Since model merging has been an established approach for improving model performance, the extra advantage it provides regarding defense can be seen as a cost-free bonus.

Using Natural Language Explanations to Improve Robustness of In-context Learning
Xuanli He | Yuxiang Wu | Oana-Maria Camburu | Pasquale Minervini | Pontus Stenetorp
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent studies demonstrated that large language models (LLMs) can excel in many tasks via in-context learning (ICL). However, recentworks show that ICL-prompted models tend to produce inaccurate results when presented with adversarial inputs. In this work, we investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets covering natural language inference and paraphrasing identification. We prompt LLMs with a small set of human-generated NLEs to produce further NLEs, yielding more accurate results than both a zero-shot-ICL setting and using only human-generated NLEs. Our results on five popular LLMs (GPT3.5-turbo, Llama2, Vicuna, Zephyr, and Mistral) show that our approach yields over 6% improvement over baseline approaches for eight adversarial datasets: HANS, ISCS, NaN, ST, PICD, PISP, ANLI, and PAWS. Furthermore, previous studies have demonstrated that prompt selection strategies significantly enhance ICL on in-distribution test sets. However, our findings reveal that these strategies do not match the efficacy of our approach for robustness evaluations, resulting in an accuracy drop of 8% compared to the proposed approach.

2023

Security Challenges in Natural Language Processing Models
Qiongkai Xu | Xuanli He
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Large-scale natural language processing models have been developed and integrated into numerous applications, given the advantage of their remarkable performance. Nonetheless, the security concerns associated with these models prevent the widespread adoption of these black-box machine learning models. In this tutorial, we will dive into three emerging security issues in NLP research, i.e., backdoor attacks, private data leakage, and imitation attacks. These threats will be introduced in accordance with their threatening usage scenarios, attack methodologies, and defense technologies.

IMBERT: Making BERT Immune to Insertion-based Backdoor Attacks
Xuanli He | Jun Wang | Benjamin Rubinstein | Trevor Cohn
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)

Backdoor attacks are an insidious security threat against machine learning models. Adversaries can manipulate the predictions of compromised models by inserting triggers into the training phase. Various backdoor attacks have been devised which can achieve nearly perfect attack success without affecting model predictions for clean inputs. Means of mitigating such vulnerabilities are underdeveloped, especially in natural language processing. To fill this gap, we introduce IMBERT, which uses either gradients or self-attention scores derived from victim models to self-defend against backdoor attacks at inference time. Our empirical studies demonstrate that IMBERT can effectively identify up to 98.5% of inserted triggers. Thus, it significantly reduces the attack success rate while attaining competitive accuracy on the clean dataset across widespread insertion-based attacks compared to two baselines. Finally, we show that our approach is model-agnostic, and can be easily ported to several pre-trained transformer models.

Overview of the 2023 ALTA Shared Task: Discriminate between Human-Written and Machine-Generated Text
Diego Molla | Haolan Zhan | Xuanli He | Qiongkai Xu
Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association

The ALTA shared tasks have been running annually since 2010. In 2023, the purpose of the task is to build automatic detection systems that can discriminate between human-written and synthetic text generated by Large Language Models (LLM). In this paper we present the task, the evaluation criteria, and the results of the systems participating in the shared task.

Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation
Xuanli He | Qiongkai Xu | Jun Wang | Benjamin Rubinstein | Trevor Cohn
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Modern NLP models are often trained over large untrusted datasets, raising the potential for a malicious adversary to compromise model behaviour. For instance, backdoors can be implanted through crafting training instances with a specific textual trigger and a target label. This paper posits that backdoor poisoning attacks exhibit a spurious correlation between simple text features and classification labels, and accordingly, proposes methods for mitigating spurious correlation as means of defence. Our empirical study reveals that the malicious triggers are highly correlated to their target labels; therefore such correlations are extremely distinguishable compared to those scores of benign features, and can be used to filter out potentially problematic instances. Compared with several existing defences, our defence method significantly reduces attack success rates across backdoor attacks, and in the case of insertion-based attacks, our method provides a near-perfect defence.

Koala: An Index for Quantifying Overlaps with Pre-training Corpora
Thuy-Trang Vu | Xuanli He | Gholamreza Haffari | Ehsan Shareghi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In very recent years more attention has been placed on probing the role of pre-training data in Large Language Models (LLMs) downstream behaviour. Despite the importance, there is no public tool that supports such analysis of pre-training corpora at large scale. To help research in this space, we launch Koala, a searchable index over large pre-training corpora using lossless compressed suffix arrays with highly efficient compression rate and search support. In its first release we index the public proportion of OPT 175B, GPT-3, GPT-Neo, GPT-Neo, LLaMA, BERT, ELECTRA, RoBERTA, XLNet pre-training corpora. Koala provides a framework to do forensic analysis on the current and future benchmarks as well as to assess the degree of memorization in the output from the LLMs. Koala is available for public use at https://koala-index.erc.monash.edu/.

Rethinking Round-Trip Translation for Machine Translation Evaluation
Terry Yue Zhuo | Qiongkai Xu | Xuanli He | Trevor Cohn
Findings of the Association for Computational Linguistics: ACL 2023

Automatic evaluation methods for translation often require model training, and thus the availability of parallel corpora limits their applicability to low-resource settings. Round-trip translation is a potential workaround, which can reframe bilingual evaluation into a much simpler monolingual task. Early results from the era of statistical machine translation (SMT) raised fundamental concerns about the utility of this approach, based on poor correlation with human translation quality judgments. In this paper, we revisit this technique with modern neural translation (NMT) and show that round-trip translation does allow for accurate automatic evaluation without the need for reference translations. These opposite findings can be explained through the copy mechanism in SMT that is absent in NMT. We demonstrate that round-trip translation benefits multiple machine translation evaluation tasks: i) predicting forward translation scores; ii) improving the performance of a quality estimation model; and iii) identifying adversarial competitors in shared tasks via cross-system verification.

2022

Can Domains Be Transferred across Languages in Multi-Domain Multilingual Neural Machine Translation?
Thuy-Trang Vu | Shahram Khadivi | Xuanli He | Dinh Phung | Gholamreza Haffari
Proceedings of the Seventh Conference on Machine Translation (WMT)

Previous works mostly focus on either multilingual or multi-domain aspects of neural machine translation (NMT). This paper investigates whether the domain information can be transferred across languages on the composition of multi-domain and multilingual NMT, particularly for the incomplete data condition where in-domain bitext is missing for some language pairs. Our results in the curated leave-one-domain-out experiments show that multi-domain multilingual (MDML) NMT can boost zero-shot translation performance up to +10 gains on BLEU, as well as aid the generalisation of multi-domain NMT to the missing domain. We also explore strategies for effective integration of multilingual and multi-domain NMT, including language and domain tag combination and auxiliary task training. We find that learning domain-aware representations and adding target-language tags to the encoder leads to effective MDML-NMT.

Generate, Annotate, and Learn: NLP with Synthetic Text
Xuanli He | Islam Nassar | Jamie Kiros | Gholamreza Haffari | Mohammad Norouzi
Transactions of the Association for Computational Linguistics, Volume 10

This paper studies the use of language models as a source of synthetic unlabeled text for NLP. We formulate a general framework called “generate, annotate, and learn (GAL)” to take advantage of synthetic text within knowledge distillation, self-training, and few-shot learning applications. To generate high-quality task-specific text, we either fine-tune LMs on inputs from the task of interest, or prompt large LMs with few examples. We use the best available classifier to annotate synthetic text with soft pseudo labels for knowledge distillation and self-training, and use LMs to obtain hard labels for few-shot learning. We train new supervised models on the combination of labeled and pseudo-labeled data, which results in significant gains across several applications. We investigate key components of GAL and present theoretical and empirical arguments against the use of class-conditional LMs to generate synthetic labeled text instead of unlabeled text. GAL achieves new state-of-the-art knowledge distillation results for 6-layer transformers on the GLUE leaderboard.

Foiling Training-Time Attacks on Neural Machine Translation Systems
Jun Wang | Xuanli He | Benjamin Rubinstein | Trevor Cohn
Findings of the Association for Computational Linguistics: EMNLP 2022

Neural machine translation (NMT) systems are vulnerable to backdoor attacks, whereby an attacker injects poisoned samples into training such that a trained model produces malicious translations. Nevertheless, there is little research on defending against such backdoor attacks in NMT. In this paper, we first show that backdoor attacks that have been successful in text classification are also effective against machine translation tasks. We then present a novel defence method that exploits a key property of most backdoor attacks: namely the asymmetry between the source and target language sentences, which is used to facilitate malicious text insertions, substitutions and suchlike. Our technique uses word alignment coupled with language model scoring to detect outlier tokens, and thus can find and filter out training instances which may contain backdoors. Experimental results demonstrate that our technique can significantly reduce the success of various attacks by up to 89.0%, while not affecting predictive accuracy.

Extracted BERT Model Leaks More Information than You Think!
Xuanli He | Lingjuan Lyu | Chen Chen | Qiongkai Xu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

The collection and availability of big data, combined with advances in pre-trained models (e.g. BERT), have revolutionized the predictive performance of natural language processing tasks. This allows corporations to provide machine learning as a service (MLaaS) by encapsulating fine-tuned BERT-based models as APIs. Due to significant commercial interest, there has been a surge of attempts to steal remote services via model extraction. Although previous works have made progress in defending against model extraction attacks, there has been little discussion on their performance in preventing privacy leakage. This work bridges this gap by launching an attribute inference attack against the extracted BERT model. Our extensive experiments reveal that model extraction can cause severe privacy leakage even when victim models are facilitated with state-of-the-art defensive strategies.

Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs
Qiongkai Xu | Xuanli He | Lingjuan Lyu | Lizhen Qu | Gholamreza Haffari
Proceedings of the 29th International Conference on Computational Linguistics

Machine-learning-as-a-service (MLaaS) has attracted millions of users to their splendid large-scale models. Although published as black-box APIs, the valuable models behind these services are still vulnerable to imitation attacks. Recently, a series of works have demonstrated that attackers manage to steal or extract the victim models. Nonetheless, none of the previous stolen models can outperform the original black-box APIs. In this work, we conduct unsupervised domain adaptation and multi-victim ensemble to showing that attackers could potentially surpass victims, which is beyond previous understanding of model extraction. Extensive experiments on both benchmark datasets and real-world APIs validate that the imitators can succeed in outperforming the original black-box models on transferred domains. We consider our work as a milestone in the research of imitation attack, especially on NLP APIs, as the superior performance could influence the defense or even publishing strategy of API providers.

2021

Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!
Xuanli He | Lingjuan Lyu | Qiongkai Xu | Lichao Sun
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by the pretrained language models, such as BERT. This allows corporations to easily build powerful APIs by encapsulating fine-tuned BERT models for downstream tasks. However, when a fine-tuned BERT model is deployed as a service, it may suffer from different attacks launched by the malicious users. In this work, we first present how an adversary can steal a BERT-based API service (the victim/target model) on multiple benchmark datasets with limited prior knowledge and queries. We further show that the extracted model can lead to highly transferable adversarial attacks against the victim model. Our studies indicate that the potential vulnerabilities of BERT-based API services still hold, even when there is an architectural mismatch between the victim model and the attack model. Finally, we investigate two defence strategies to protect the victim model, and find that unless the performance of the victim model is sacrificed, both model extraction and adversarial transferability can effectively compromise the target models.

Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection
Thuy-Trang Vu | Xuanli He | Dinh Phung | Gholamreza Haffari
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This paper considers the unsupervised domain adaptation problem for neural machine translation (NMT), where we assume the access to only monolingual text in either the source or target language in the new domain. We propose a cross-lingual data selection method to extract in-domain sentences in the missing language side from a large generic monolingual corpus. Our proposed method trains an adaptive layer on top of multilingual BERT by contrastive learning to align the representation between the source and target language. This then enables the transferability of the domain classifier between the languages in a zero-shot manner. Once the in-domain data is detected by the classifier, the NMT model is then adapted to the new domain by jointly learning translation and domain discrimination tasks. We evaluate our cross-lingual data selection method on NMT across five diverse domains in three language pairs, as well as a real-world scenario of translation for COVID-19. The results show that our proposed method outperforms other selection baselines up to +1.5 BLEU score.

2020

Scene Graph Modification Based on Natural Language Commands
Xuanli He | Quan Hung Tran | Gholamreza Haffari | Walter Chang | Zhe Lin | Trung Bui | Franck Dernoncourt | Nhan Dam
Findings of the Association for Computational Linguistics: EMNLP 2020

Structured representations like graphs and parse trees play a crucial role in many Natural Language Processing systems. In recent years, the advancements in multi-turn user interfaces necessitate the need for controlling and updating these structured representations given new sources of information. Although there have been many efforts focusing on improving the performance of the parsers that map text to graphs or parse trees, very few have explored the problem of directly manipulating these representations. In this paper, we explore the novel problem of graph modification, where the systems need to learn how to update an existing scene graph given a new user’s command. Our novel models based on graph-based sparse transformer and cross attention information fusion outperform previous systems adapted from the machine translation and graph generation literature. We further contribute our large graph modification datasets to the research community to encourage future research for this new problem.

Differentially Private Representation for NLP: Formal Guarantee and An Empirical Study on Privacy and Fairness
Lingjuan Lyu | Xuanli He | Yitong Li
Findings of the Association for Computational Linguistics: EMNLP 2020

It has been demonstrated that hidden representation learned by deep model can encode private information of the input, hence can be exploited to recover such information with reasonable accuracy. To address this issue, we propose a novel approach called Differentially Private Neural Representation (DPNR) to preserve privacy of the extracted representation from text. DPNR utilises Differential Privacy (DP) to provide formal privacy guarantee. Further, we show that masking words via dropout can further enhance privacy. To maintain utility of the learned representation, we integrate DP-noisy representation into a robust training process to derive a robust target model, which also helps for model fairness over various demographic variables. Experimental results on benchmark datasets under various parameter settings demonstrate that DPNR largely reduces privacy leakage without significantly sacrificing the main task performance.

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
Xuanli He | Gholamreza Haffari | Mohammad Norouzi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper introduces Dynamic Programming Encoding (DPE), a new segmentation algorithm for tokenizing sentences into subword units. We view the subword segmentation of output sentences as a latent variable that should be marginalized out for learning and inference. A mixed character-subword transformer is proposed, which enables exact log marginal likelihood estimation and exact MAP inference to find target segmentations with maximum posterior probability. DPE uses a lightweight mixed character-subword transformer as a means of pre-processing parallel data to segment output sentences using dynamic programming. Empirical results on machine translation suggest that DPE is effective for segmenting output sentences and can be combined with BPE dropout for stochastic segmentation of source sentences. DPE achieves an average improvement of 0.9 BLEU over BPE (Sennrich et al., 2016) and an average improvement of 0.55 BLEU over BPE dropout (Provilkov et al., 2019) on several WMT datasets including English <=> (German, Romanian, Estonian, Finnish, Hungarian).

2019

A Pointer Network Architecture for Context-Dependent Semantic Parsing
Xuanli He | Quan Tran | Gholamreza Haffari
Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association

2018

Sequence to Sequence Mixture Model for Diverse Machine Translation
Xuanli He | Gholamreza Haffari | Mohammad Norouzi
Proceedings of the 22nd Conference on Computational Natural Language Learning

Sequence to sequence (SEQ2SEQ) models lack diversity in their generated translations. This can be attributed to their limitations in capturing lexical and syntactic variations in parallel corpora, due to different styles, genres, topics, or ambiguity of human translation process. In this paper, we develop a novel sequence to sequence mixture (S2SMIX) model that improves both translation diversity and quality by adopting a committee of specialized translation models rather than a single translation model. Each mixture component selects its own training dataset via optimization of the marginal log-likelihood, which leads to a soft clustering of the parallel corpus. Experiments on four language pairs demonstrate the superiority of our mixture model compared to SEQ2SEQ model with the standard and diversity encouraged beam search. Our mixture model incurs negligible additional parameters and no extra computation in the decoding time.

Exploring Textual and Speech information in Dialogue Act Classification with Speaker Domain Adaptation
Xuanli He | Quan Tran | William Havard | Laurent Besacier | Ingrid Zukerman | Gholamreza Haffari
Proceedings of the Australasian Language Technology Association Workshop 2018

In spite of the recent success of Dialogue Act (DA) classification, the majority of prior works focus on text-based classification with oracle transcriptions, i.e. human transcriptions, instead of Automatic Speech Recognition (ASR)’s transcriptions. In spoken dialog systems, however, the agent would only have access to noisy ASR transcriptions, which may further suffer performance degradation due to domain shift. In this paper, we explore the effectiveness of using both acoustic and textual signals, either oracle or ASR transcriptions, and investigate speaker domain adaptation for DA classification. Our multimodal model proves to be superior to the unimodal models, particularly when the oracle transcriptions are not available. We also propose an effective method for speaker domain adaptation, which achieves competitive results.

2017

Word Representation Models for Morphologically Rich Languages in Neural Machine Translation
Ekaterina Vylomova | Trevor Cohn | Xuanli He | Gholamreza Haffari
Proceedings of the First Workshop on Subword and Character Level Models in NLP

Out-of-vocabulary words present a great challenge for Machine Translation. Recently various character-level compositional models were proposed to address this issue. In current research we incorporate two most popular neural architectures, namely LSTM and CNN, into hard- and soft-attentional models of translation for character-level representation of the source. We propose semantic and morphological intrinsic evaluation of encoder-level representations. Our analysis of the learned representations reveals that character-based LSTM seems to be better at capturing morphological aspects compared to character-based CNN. We also show that hard-attentional model provides better character-level representations compared to vanilla one.

Co-authors

Aryo Pradipta Gema 4

Pontus Stenetorp 4

Mohammad Norouzi 3

Quan Hung Tran 3

David Ifeoluwa Adelani 2

Andiswa Bukula 2

Alessio Devoto 2

Shamsuddeen Hassan Muhammad 2

Lolwethu Ndolela 2

Millicent Ochieng 2

Emile Van Krieken 2

Foutse Yuehgoh 2

Saheed S. Abdullahi 1

Tosin Adewumi 1

Abeeb Afolabi 1

Sweta Agrawal 1

Zainab Akinjobi 1

Sana Al-Azzawi 1

Jesujoba Alabi 1

Lama Alkhaled 1

Anuoluwapo Aremu 1

Oluwabusayo Olufunke Awoyomi 1

Israel Abebe Azime 1

Claire Barale 1

Laurent Besacier 1

Sofia Bourhim 1

Eleftheria Briakou 1

Happy Buzaaba 1

Oana-Maria Camburu 1

Marine Carpuat 1

Chiamaka Ijeoma Chukwuneke 1

Chiamaka Chukwuneke 1

Franck Dernoncourt 1

Salma El Anigri 1

Naome A. Etori 1

Mohammad Reza Ghasemi Madani 1

Tadesse Kebede Guge 1

Joshua Harris 1

Ayinde Hassan 1

William Havard 1

Oumaima Hourrane 1

Ruqayya Nasir Iro 1

Salomon Kabongo Kabenamualu 1

Godson Koffi Kalipe 1

Shahram Khadivi 1

Wangui Kimotho 1

Wangari Kimotho 1

Joshua Ong Jun Leang 1

En-Shiun Annie Lee 1

Rooweither Mabuya 1

Ricky Macharm 1

Alberto Carlo Maria Mancino 1

Thabiso Mangwana 1

Chinedu Emmanuel Mbonu 1

Robert McHardy 1

Muhidin Mohamed 1

Shafie Abdi Mohamed 1

Hamam Mokayed 1

Stephen E. Moore 1

Maximilian Mozes 1

Jonathan Mukiibi 1

Christine Mwase 1

Samuel Njoroge 1

Nnaemeka Obiefuna 1

Onyekachi Raphael Ogbu 1

Temitayo Olatoye 1

Abdul-Hakeem Omotayo 1

Bernard Opoku 1

Verrah Akinyi Otiende 1

Mmasibidi Setaka 1

Ehsan Shareghi 1

Tombekai Vangoni Sherman 1

Iyanuoluwa Shode 1

Blessing Kudzaishe Sibanda 1

Clemencia Siro 1

Srinibas Swain 1

Sakayo Toadoum Sari 1

Ekaterina Vylomova 1

Lyse Naomi Wamba Momo 1

Jingjie Zheng 1

Jian Yun Zhuang 1

Terry Yue Zhuo 1

Ingrid Zukerman 1

Venues