Casteism in India, but Not Racism - a Study of Bias in Word Embeddings of Indian Languages

Senthil Kumar B, Pranav Tiwari, Aman Chandra Kumar, Aravindan Chandrabose


Abstract
In this paper, we studied the gender bias in monolingual word embeddings of two Indian languages Hindi and Tamil. Tamil is one of the classical languages of India from the Dravidian language family. In Indian society and culture, instead of racism, a similar type of discrimination called casteism is against the subgroup of peoples representing lower class or Dalits. The word embeddings measurement to evaluate bias using the WEAT score reveals that the embeddings are biased with gender and casteism which is in line with the common stereotypical human biases.
Anthology ID:
2022.lateraisse-1.1
Volume:
Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Kolawole Adebayo, Rohan Nanda, Kanishk Verma, Brian Davis
Venue:
LATERAISSE
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1–7
Language:
URL:
https://aclanthology.org/2022.lateraisse-1.1
DOI:
Bibkey:
Cite (ACL):
Senthil Kumar B, Pranav Tiwari, Aman Chandra Kumar, and Aravindan Chandrabose. 2022. Casteism in India, but Not Racism - a Study of Bias in Word Embeddings of Indian Languages. In Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference, pages 1–7, Marseille, France. European Language Resources Association.
Cite (Informal):
Casteism in India, but Not Racism - a Study of Bias in Word Embeddings of Indian Languages (B et al., LATERAISSE 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lateraisse-1.1.pdf