Portia Cooper

2025

The Lies Characters Tell: Utilizing Large Language Models to Normalize Adversarial Unicode Perturbations
Portia Cooper | Eduardo Blanco | Mihai Surdeanu
Findings of the Association for Computational Linguistics: ACL 2025

Homoglyphs, Unicode characters that are visually homogeneous to Latin letters, are widely used to mask offensive content. Dynamic strategies are needed to combat homoglyphs as the Unicode library is ever-expanding and new substitution possibilities for Latin letters continuously emerge. The present study investigated two novel mitigation approaches that do not rely on strict mappings but instead harness the power of large language models to neutralize both known and unknown homoglyphs: (1) indirectly normalizing homoglyphs by replacing non-Latin characters with a delimiter and prompting large language models to “fill in the blanks” and (2) directly normalizing homoglyphs by using large language models to determine which characters should be replaced with Latin letters. We found that GPT-4o-mini constructed normalized text with an average cosine similarity score of 0.91 to the original tweets when applying our indirect method and 0.96 to the original tweets when applying our direct method. This study indicates that large language model-based normalization techniques can effectively unmask offensive content concealed by homoglyphs. Code and data are available in our GitHub repository: https://github.com/pcoopercoder/The-Lies-Characters-Tell.

2023

pdf bib abs

Hiding in Plain Sight: Tweets with Hate Speech Masked by Homoglyphs
Portia Cooper | Mihai Surdeanu | Eduardo Blanco
Findings of the Association for Computational Linguistics: EMNLP 2023

To avoid detection by current NLP monitoring applications, progenitors of hate speech often replace one or more letters in offensive words with homoglyphs, visually similar Unicode characters. Harvesting real-world hate speech containing homoglyphs is challenging due to the vast replacement possibilities. We developed a character substitution scraping method and assembled the Offensive Tweets with Homoglyphs (OTH) Dataset (N=90,788) with more than 1.5 million occurrences of 1,281 non-Latin characters (emojis excluded). In an annotated sample (n=700), 40.14% of the tweets were found to contain hate speech. We assessed the performance of seven transformer-based hate speech detection models and found that they performed poorly in a zero-shot setting (F1 scores between 0.04 and 0.52) but normalizing the data dramatically improved detection (F1 scores between 0.59 and 0.71). Training the models using the annotated data further boosted performance (highest micro-averaged F1 score=0.88, using five-fold cross validation). This study indicates that a dataset containing homoglyphs known and unknown to the scraping script can be collected, and that neural models can be trained to recognize camouflaged real-world hate speech.

Co-authors

Eduardo Blanco 2
Mihai Surdeanu 2

Venues

Findings2

Fix author