Yuan Zhao


2024

pdf bib
MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms
Patrick Lee | Alain Chirino Trujillo | Diana Cuevas Plancarte | Olumide Ojo | Xinyi Liu | Iyanuoluwa Shode | Yuan Zhao | Anna Feldman | Jing Peng
Findings of the Association for Computational Linguistics: EACL 2024

Euphemisms are found across the world’s languages, making them a universal linguistic phenomenon. As such, euphemistic data may have useful properties for computational tasks across languages. In this study, we explore this premise by training a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases where multilingual models perform better on the task compared to monolingual models by a statistically significant margin, indicating that multilingual data presents additional opportunities for models to learn about cross-lingual, computational properties of euphemisms. In a follow-up analysis, we focus on universal euphemistic “categories” such as death and bodily functions among others. We test to see whether cross-lingual data of the same domain is more important than within-language data of other domains to further understand the nature of the cross-lingual transfer.

2023

pdf bib
FEED PETs: Further Experimentation and Expansion on the Disambiguation of Potentially Euphemistic Terms
Patrick Lee | Iyanuoluwa Shode | Alain Trujillo | Yuan Zhao | Olumide Ojo | Diana Plancarte | Anna Feldman | Jing Peng
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Transformers have been shown to work well for the task of English euphemism disambiguation, in which a potentially euphemistic term (PET) is classified as euphemistic or non-euphemistic in a particular context. In this study, we expand on the task in two ways. First, we annotate PETs for vagueness, a linguistic property associated with euphemisms, and find that transformers are generally better at classifying vague PETs, suggesting linguistic differences in the data that impact performance. Second, we present novel euphemism corpora in three different languages: Yoruba, Spanish, and Mandarin Chinese. We perform euphemism disambiguation experiments in each language using multilingual transformer models mBERT and XLM-RoBERTa, establishing preliminary results from which to launch future work.