Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and Hindi

Sourabrata Mukherjee; Atharva Mehta; Sougata Saha; Akhil Arora; Monojit Choudhury

doi:10.18653/v1/2025.emnlp-main.966

Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and Hindi

Sourabrata Mukherjee, Atharva Mehta, Sougata Saha, Akhil Arora, Monojit Choudhury

Abstract

The obligatory use of third-person honorifics is a distinctive feature of several South Asian languages, encoding nuanced socio-pragmatic cues such as power, age, gender, fame, and social distance.In this work, (i) We present the first large-scale study of third-person honorific pronoun and verb usage across 10,000 Hindi and Bengali Wikipedia articles with annotations linked to key socio-demographic attributes of the subjects, including gender, age group, fame, and cultural origin.(ii) Our analysis uncovers systematic intra-language regularities but notable cross-linguistic differences: honorifics are more prevalent in Bengali than in Hindi, while non-honorifics dominate while referring to infamous, juvenile, and culturally “exotic” entities. Notably, in both languages, and more prominently in Hindi, men are more frequently addressed with honorifics than women.(iii) To examine whether large language models (LLMs) internalize similar socio-pragmatic norms, we probe six LLMs using controlled generation and translation tasks over 1,000 culturally balanced entities. We find that LLMs diverge from Wikipedia usage, exhibiting alternative preferences in honorific selection across tasks, languages, and socio-demographic attributes. These discrepancies highlight gaps in the socio-cultural alignment of LLMs and open new directions for studying how LLMs acquire, adapt, or distort social-linguistic norms. Our code and data are publicly available at https://github.com/souro/honorific-wiki-llm

Anthology ID:: 2025.emnlp-main.966
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19103–19126
Language:
URL:: https://aclanthology.org/2025.emnlp-main.966/
DOI:: 10.18653/v1/2025.emnlp-main.966
Bibkey:
Cite (ACL):: Sourabrata Mukherjee, Atharva Mehta, Sougata Saha, Akhil Arora, and Monojit Choudhury. 2025. Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and Hindi. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 19103–19126, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and Hindi (Mukherjee et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.966.pdf
Checklist:: 2025.emnlp-main.966.checklist.pdf

PDF Cite Search Checklist Fix data