Shakila Mahjabin Tonni
2025
Some Odd Adversarial Perturbations and the Notion of Adversarial Closeness
Shakila Mahjabin Tonni | Pedro Faustini | Mark Dras
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association
Shakila Mahjabin Tonni | Pedro Faustini | Mark Dras
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association
Deep learning models for language are vulnerable to adversarial examples. However, the perturbations introduced can sometimes seem odd or very noticeable to humans, which can make them less effective, a notion captured in some recent investigations as a property of '(non-)suspicion’. In this paper, we focus on three main types of perturbations that may raise suspicion: changes to named entities, inconsistent morphological inflections, and the use of non-English words. We define a notion of adversarial closeness and collect human annotations to construct two new datasets. We then use these datasets to investigate whether these kinds of perturbations have a disproportionate effect on human judgements. Following that, we propose new constraints to include in a constraint-based optimisation approach to adversarial text generation. Our human evaluation shows that these do improve the process by preventing the generation of especially odd or marked texts.
IDT: Dual-Task Adversarial Rewriting for Attribute Anonymization
Pedro Faustini | Shakila Mahjabin Tonni | Annabelle McIver | Qiongkai Xu | Mark Dras
Computational Linguistics, Volume 51, Issue 4 - December 2025
Pedro Faustini | Shakila Mahjabin Tonni | Annabelle McIver | Qiongkai Xu | Mark Dras
Computational Linguistics, Volume 51, Issue 4 - December 2025
Natural language processing (NLP) models may leak private information in different ways, including membership inference, reconstruction, or attribute inference attacks. Sensitive information may not be explicit in the text, but hidden in underlying writing characteristics. Methods to protect privacy can involve using representations inside models that are demonstrated not to detect sensitive attributes or—for instance, in cases where users might be at risk from an untrustworthy model, the sort of scenario of interest here—changing the raw text before models can have access to it. The goal is to rewrite text to prevent someone from inferring a sensitive attribute (e.g., the gender of the author, or their location by the writing style) while keeping the text useful for its original intention (e.g., the sentiment of a product review). The few works tackling this have focused on generative techniques. However, these often create extensively different texts from the original ones or face problems such as mode collapse. This article explores a novel adaptation of adversarial attack techniques to manipulate a text to deceive a classifier w.r.t. one task (privacy) while keeping the predictions of another classifier trained for another task (utility) unchanged. We propose IDT, a method that analyses predictions made by auxiliary and interpretable models to identify which tokens are important to change for the privacy task, and which ones should be kept for the utility task. We evaluate different datasets for NLP suitable for different tasks. Automatic and human evaluations show that IDT retains the utility of text, while also outperforming existing methods when deceiving a classifier w.r.t. a privacy task.
Graded Suspiciousness of Adversarial Texts to Humans
Shakila Mahjabin Tonni | Pedro Faustini | Mark Dras
Computational Linguistics, Volume 51, Issue 3 - September 2025
Shakila Mahjabin Tonni | Pedro Faustini | Mark Dras
Computational Linguistics, Volume 51, Issue 3 - September 2025
Adversarial examples pose a significant challenge to deep neural networks across both image and text domains, with the intent to degrade model performance through carefully altered inputs. Adversarial texts, however, are distinct from adversarial images due to their requirement for semantic similarity and the discrete nature of the textual contents. This study delves into the concept of human suspiciousness, a quality distinct from the traditional focus on imperceptibility found in image-based adversarial examples, where adversarial changes are often desired to be indistinguishable to the human eye even when placed side by side with originals. Although this is generally not possible with text, textual adversarial content must still often remain undetected or non-suspicious to human readers. Even when the text’s purpose is to deceive NLP systems or bypass filters, the text is often expected to be natural to read. In this research, we expand the study of human suspiciousness by analyzing how individuals perceive adversarial texts. We gather and publish a novel dataset of Likert-scale human evaluations on the suspiciousness of adversarial sentences, crafted by four widely used adversarial attack methods and assess their correlation with the human ability to detect machine-generated alterations. Additionally, we develop a regression-based model to predict levels of suspiciousness and establish a baseline for future research in reducing the suspiciousness in adversarial text generation. We also demonstrate how the regressor-generated suspicious scores can be incorporated into adversarial generation methods to produce texts that are less likely to be perceived as computer-generated.
CSIRO LT at SemEval-2025 Task 8: Answering Questions over Tabular Data using LLMs
Tomas Turek | Shakila Mahjabin Tonni | Vincent Nguyen | Huichen Yang | Sarvnaz Karimi
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Tomas Turek | Shakila Mahjabin Tonni | Vincent Nguyen | Huichen Yang | Sarvnaz Karimi
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Question Answering over large tables is challenging due to the difficulty of reasoning required in linking information from different parts of a table, such as heading and metadata to the values in the table and information needs. We investigate using Large Language Models (LLM) for tabular reasoning, where, given a pair of a table and a question from the DataBench benchmark, the models generate answers. We experiment with three techniques that enables symbolic reasoning through code execution: a direct code prompting (DCP) approach, ‘DCP_Py’, which uses Python, multi-step code (MSC) prompting ‘MSC_SQL+FS’ using SQL and ReAct prompting, ‘MSR_Py+FS’, which combines multi-step reasoning (MSR), few-shot (FS) learning and Python tools. We also conduct an analysis exploring the impact of answer types, data size, and multi-column dependencies on LLMs’ answer generation performance, including an assessment of the models’ limitations and the underlying challenges of tabular reasoning in LLMs.