Resmi Ramachandranpillai
2024
Are Generative Language Models Multicultural? A Study on Hausa Culture and Emotions using ChatGPT
Ibrahim Ahmad
|
Shiran Dudy
|
Resmi Ramachandranpillai
|
Kenneth Church
Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP
Large Language Models (LLMs), such as ChatGPT, are widely used to generate content for various purposes and audiences. However, these models may not reflect the cultural and emotional diversity of their users, especially for low-resource languages. In this paper, we investigate how ChatGPT represents Hausa’s culture and emotions. We compare responses generated by ChatGPT with those provided by native Hausa speakers on 37 culturally relevant questions. We conducted experiments using emotion analysis. We also used two similarity metrics to measure the alignment between human and ChatGPT responses. We also collect human participants ratings and feedback on ChatGPT responses. Our results show that ChatGPT has some level of similarity to human responses, but also exhibits some gaps and biases in its knowledge and awareness of Hausa culture and emotions. We discuss the implications and limitations of our methodology and analysis and suggest ways to improve the performance and evaluation of LLMs for low-resource languages.
All Models are Wrong, But Some are Deadly: Inconsistencies in Emotion Detection in Suicide-related Tweets
Annika Marie Schoene
|
Resmi Ramachandranpillai
|
Tomo Lazovich
|
Ricardo A. Baeza-Yates
Proceedings of the Third Workshop on NLP for Positive Impact
Recent work in psychology has shown that people who experience mental health challenges are more likely to express their thoughts, emotions, and feelings on social media than share it with a clinical professional. Distinguishing suicide-related content, such as suicide mentioned in a humorous context, from genuine expressions of suicidal ideation is essential to better understanding context and risk. In this paper, we give a first insight and analysis into the differences between emotion labels annotated by humans and labels predicted by three fine-tuned language models (LMs) for suicide-related content. We find that (i) there is little agreement between LMs and humans for emotion labels of suicide-related Tweets and (ii) individual LMs predict similar emotion labels for all suicide-related categories. Our findings lead us to question the credibility and usefulness of such methods in high-risk scenarios such as suicide ideation detection.
Search
Co-authors
- Ibrahim Ahmad 1
- Shiran Dudy 1
- Kenneth Church 1
- Annika Marie Schoene 1
- Tomo Lazovich 1
- show all...