Daniel Marciniak
2024
Understanding Slang with LLMs: Modelling Cross-Cultural Nuances through Paraphrasing
Ifeoluwa Wuraola
|
Nina Dethlefs
|
Daniel Marciniak
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
In the realm of social media discourse, the integration of slang enriches communication, reflecting the sociocultural identities of users. This study investigates the capability of large language models (LLMs) to paraphrase slang within climate-related tweets from Nigeria and the UK, with a focus on identifying emotional nuances. Using DistilRoBERTa as the base-line model, we observe its limited comprehension of slang. To improve cross-cultural understanding, we gauge the effectiveness of leading LLMs ChatGPT 4, Gemini, and LLaMA3 in slang paraphrasing. While ChatGPT 4 and Gemini demonstrate comparable effectiveness in slang paraphrasing, LLaMA3 shows less coverage, with all LLMs exhibiting limitations in coverage, especially of Nigerian slang. Our findings underscore the necessity for culturally sensitive LLM development in emotion classification, particularly in non-anglocentric regions.
2023
Linguistic Pattern Analysis in the Climate Change-Related Tweets from UK and Nigeria
Ifeoluwa Wuraola
|
Nina Dethlefs
|
Daniel Marciniak
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
To understand the global trends of human opinion on climate change in specific geographical areas, this research proposes a framework to analyse linguistic features and cultural differences in climate-related tweets. Our study combines transformer networks with linguistic feature analysis to address small dataset limitations and gain insights into cultural differences in tweets from the UK and Nigeria. Our study found that Nigerians use more leadership language and informal words in discussing climate change on Twitter compared to the UK, as these topics are treated as an issue of salience and urgency. In contrast, the UK’s discourse about climate change on Twitter is characterised by using more formal, logical, and longer words per sentence compared to Nigeria. Also, we confirm the geographical identifiability of tweets through a classification task using DistilBERT, which achieves 83% of accuracy.
Search