Varun Chandrasekaran

2024

pdf bib abs
Bypassing LLM Watermarks with Color-Aware Substitutions
Qilong Wu | Varun Chandrasekaran
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Watermarking approaches are proposed to identify if text being circulated is human- or large language model- (LLM) generated. The state-of-the-art watermarking strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific (“green”) tokens. However, determining the robustness of this watermarking method under finite (low) edit budgets is an open problem. Additionally, existing attack methods failto evade detection for longer text segments. We overcome these limitations, and propose Self Color Testing-based Substitution (SCTS), thefirst “color-aware” attack. SCTS obtains color information by strategically prompting the watermarked LLM and comparing output tokensfrequencies. It uses this information to determine token colors, and substitutes green tokens with non-green ones. In our experiments, SCTS successfully evades watermark detection using fewer number of edits than related work. Additionally, we show both theoretically and empirically that SCTS can remove the watermark for arbitrarily long watermarked text.

pdf bib abs
Designing Informative Metrics for Few-Shot Example Selection
Rishabh Adiga | Lakshmi Subramanian | Varun Chandrasekaran
Findings of the Association for Computational Linguistics: ACL 2024

Pretrained language models (PLMs) have shown remarkable few-shot learning capabilities when provided with properly formatted examples. However, selecting the “best” examples remains an open challenge. We propose a complexity-based prompt selection approach for sequence tagging tasks. This approach avoids the training of a dedicated model for selection of examples, and instead uses certain metrics to align the syntactico-semantic complexity of test sentences and examples. We use both sentence- and word-level metrics to match the complexity of examples to the (test) sentence being considered. Our results demonstrate that our approach extracts greater performance from PLMs: it achieves state-of-the-art performance on few-shot NER, achieving a 5% absolute improvement in F1 score on the CoNLL2003 dataset for GPT-4. We also see large gains of upto 28.85 points (F1/Acc.) in smaller models like GPT-j-6B.

Co-authors

Venues

acl1
findings1

Fix data