Siddhesh Pawar
2025
Survey of Cultural Awareness in Language Models: Text and Beyond
Siddhesh Pawar | Junyeong Park | Jiho Jin | Arnav Arora | Junho Myung | Srishti Yadav | Faiz Ghifari Haznitrama | Inhwa Song | Alice Oh | Isabelle Augenstein
Computational Linguistics, Volume 51, Issue 3 - September 2025
Siddhesh Pawar | Junyeong Park | Jiho Jin | Arnav Arora | Junho Myung | Srishti Yadav | Faiz Ghifari Haznitrama | Inhwa Song | Alice Oh | Isabelle Augenstein
Computational Linguistics, Volume 51, Issue 3 - September 2025
Large-scale deployment of large language models (LLMs) in various applications, such as chatbots and virtual assistants, requires LLMs to be culturally sensitive to the user to ensure inclusivity. Culture has been widely studied in psychology and anthropology, and there has been a recent surge in research on making LLMs more culturally inclusive, going beyond multilinguality and building on findings from psychology and anthropology. In this article, we survey efforts towards incorporating cultural awareness into text-based and multimodal LLMs. We start by defining cultural awareness in LLMs, taking definitions of culture from the anthropology and psychology literature as a point of departure. We then examine methodologies adopted for creating cross-cultural datasets, strategies for cultural inclusion in downstream tasks, and methodologies that have been used for benchmarking cultural awareness in LLMs. Further, we discuss the ethical implications of cultural alignment, the role of human–computer interaction in driving cultural inclusion in LLMs, and the role of cultural alignment in driving social science research. We finally provide pointers to future research based on our findings about gaps in the literature.1
2023
Evaluating Cross Lingual Transfer for Morphological Analysis: a Case Study of Indian Languages
Siddhesh Pawar | Pushpak Bhattacharyya | Partha Talukdar
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Siddhesh Pawar | Pushpak Bhattacharyya | Partha Talukdar
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Recent advances in pretrained multilingual models such as Multilingual T5 (mT5) have facilitated cross-lingual transfer by learning shared representations across languages. Leveraging pretrained multilingual models for scaling morphology analyzers to low-resource languages is a unique opportunity that has been under-explored so far. We investigate this line of research in the context of Indian languages, focusing on two important morphological sub-tasks: root word extraction and tagging morphosyntactic descriptions (MSD), viz., gender, number, and person (GNP). We experiment with six Indian languages from two language families (Dravidian and Indo-Aryan) to train a multilingual morphology analyzers for the first time for Indian languages. We demonstrate the usability of multilingual models for few-shot cross-lingual transfer through an average 7% increase in GNP tagging in a cross-lingual setting as compared to a monolingual setting through controlled experiments. We provide an overview of the state of the datasets available related to our tasks and point-out a few modeling limitations due to datasets. Lastly, we analyze the cross-lingual transfer of morphological tags for verbs and nouns, which provides a proxy for the quality of representations of word markings learned by the model.