Chen Cecilia Liu
2026
Tailored Emotional LLM-Supporter: Enhancing Cultural Sensitivity
Chen Cecilia Liu | Hiba Arnaout | Nils Kovačić | Dana Atzil-Slonim | Iryna Gurevych
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Chen Cecilia Liu | Hiba Arnaout | Nils Kovačić | Dana Atzil-Slonim | Iryna Gurevych
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) show promise in offering emotional support and generating empathetic responses for individuals in distress, but their ability to deliver culturally sensitive support remains underexplored due to a lack of resources. In this work, we introduce , the first dataset designed for this task, spanning four cultures and including 1,729 distress messages, 1,523 cultural signals, and 1,041 support strategies with fine-grained emotional and cultural annotations. Leveraging , we (i) develop and test four adaptation strategies for guiding three state-of-the-art LLMs toward culturally sensitive responses; (ii) conduct comprehensive evaluations using LLM-as-a-Judge, in-culture human annotators, and clinical psychologists; (iii) show that adapted LLMs outperform anonymous online peer responses, and that simple cultural role-play is insufficient for cultural sensitivity; and (iv) explore the application of LLMs in clinical training, where experts highlight their potential in fostering cultural competence in novice therapists.
LLMs as Cultural Archives: Cultural Commonsense Knowledge Graph Extraction
Junior Cedric Tonga | Chen Cecilia Liu | Iryna Gurevych | Fajri Koto
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Junior Cedric Tonga | Chen Cecilia Liu | Iryna Gurevych | Fajri Koto
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) encode rich cultural knowledge learned from diverse web-scale data, offering an unprecedented opportunity to model cultural commonsense at scale. Yet this knowledge remains mostly implicit and unstructured, limiting its interpretability and use. We present an iterative, prompt-based framework for constructing a Cultural Commonsense Knowledge Graph (CCKG) that treats LLMs as cultural archives, systematically eliciting culture-specific entities, relations, and practices and composing them into multi-step inferential chains across languages. We evaluate CCKG on five countries with human judgments of cultural relevance, correctness, and path coherence. We find that the cultural knowledge graphs are better realized in English, even when the target culture is non-English (e.g., Chinese, Indonesian, Arabic), indicating uneven cultural encoding in current LLMs. Augmenting smaller LLMs with CCKG improves performance on cultural reasoning and story generation, with the largest gains from English chains. Our results show both the promise and limits of LLMs as cultural technologies and that chain-structured cultural knowledge is a practical substrate for culturally grounded NLP.
2025
Cultural Learning-Based Culture Adaptation of Language Models
Chen Cecilia Liu | Anna Korhonen | Iryna Gurevych
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chen Cecilia Liu | Anna Korhonen | Iryna Gurevych
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Adapting large language models (LLMs) to diverse cultural values is a challenging task, as existing LLMs often reflect the values of specific groups by default, and potentially cause harm to others. In this paper, we present CLCA, a novel framework for enhancing LLM alignment with cultural values based on cultural learning. The framework leverages simulated social interactions to generate conversations in which LLMs engage in role-playing within culturally adapted social scenarios, capturing implicit cultural norms for model fine-tuning. CLCA improves cultural value alignment across various model architectures measured using World Value Survey data, demonstrating the effectiveness of our proposed approach. Our results provide early evidence that understanding intent and social interactions can enhance cultural value adaptation in LLMs, highlighting the promise of training approaches based on cultural learning.
From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs
Farid Adilazuarda | Chen Cecilia Liu | Iryna Gurevych | Alham Fikri Aji
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Farid Adilazuarda | Chen Cecilia Liu | Iryna Gurevych | Alham Fikri Aji
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Adapting cultural values in Large Language Models (LLMs) presents significant challenges, particularly due to biases and data limitations. Previous work aligns LLMs with different cultures using survey data, primarily from the World Values Survey (WVS). However, it remains unclear whether this approach effectively captures cultural nuances or produces distinct cultural representations for tasks like offensiveness classification. In this paper, we systematically investigate WVS-based training for cultural value adaptation and find that relying solely on survey data can homogenize cultural norms and interfere with factual knowledge. To address these issues, we propose augmenting WVS with encyclopedic and scenario-based cultural narratives from Wikipedia and NormAd. Our experiments across multiple cultures show that this approach captures more enhances differentiated cultural values and improves downstream classification performances.
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art
Chen Cecilia Liu | Iryna Gurevych | Anna Korhonen
Transactions of the Association for Computational Linguistics, Volume 13
Chen Cecilia Liu | Iryna Gurevych | Anna Korhonen
Transactions of the Association for Computational Linguistics, Volume 13
The surge of interest in culture in NLP has inspired much recent research, but a shared understanding of “culture” remains unclear, making it difficult to evaluate progress in this emerging area. Drawing on prior research in NLP and related fields, we propose a fine-grained taxonomy of elements in culture that can provide a systematic framework for analyzing and understanding research progress. Using the taxonomy, we survey existing resources and methods for culturally aware and adapted NLP, providing an overview of the state of the art and the research gaps that still need to be filled.
2024
FUN with Fisher: Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing
Chen Cecilia Liu | Jonas Pfeiffer | Ivan Vulić | Iryna Gurevych
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Chen Cecilia Liu | Jonas Pfeiffer | Ivan Vulić | Iryna Gurevych
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Standard fine-tuning of language models typically performs well on in-distribution data, but suffers with generalization to distribution shifts. In this work, we aim to improve the generalization of adapter-based cross-lingual task transfer where such cross-language distribution shifts are imminent. We investigate scheduled unfreezing algorithms –originally proposed to mitigate catastrophic forgetting in transfer learning – for fine-tuning task adapters. Our experiments show that scheduled unfreezing methods close the gap to full fine-tuning and achieve stronger cross-lingual transfer performance, suggesting that these methods can go beyond just mitigating catastrophic forgetting. Next, aiming to understand these empirical findings, we investigate the learning dynamics of scheduled unfreezing using Fisher Information. Our experiments reveal that scheduled unfreezing induces different learning dynamics compared to standard fine-tuning, and provide evidence that the dynamics of Fisher Information during training correlate with cross-lingual generalization performance. We additionally propose a general scheduled unfreezing algorithm that achieves an average of 2 points improvement over four datasets compared to standard fine-tuning and provides empirical evidence for a theory-based justification of the heuristic unfreezing schedule for task adapter training.
2023
Delving Deeper into Cross-lingual Visual Question Answering
Chen Cecilia Liu | Jonas Pfeiffer | Anna Korhonen | Ivan Vulić | Iryna Gurevych
Findings of the Association for Computational Linguistics: EACL 2023
Chen Cecilia Liu | Jonas Pfeiffer | Anna Korhonen | Ivan Vulić | Iryna Gurevych
Findings of the Association for Computational Linguistics: EACL 2023
Visual question answering (VQA) is one of the crucial vision-and-language tasks. Yet, existing VQA research has mostly focused on the English language, due to a lack of suitable evaluation resources. Previous work on cross-lingual VQA has reported poor zero-shot transfer performance of current multilingual multimodal Transformers with large gaps to monolingual performance, without any deeper analysis. In this work, we delve deeper into the different aspects of cross-lingual VQA, aiming to understand the impact of 1) modeling methods and choices, including architecture, inductive bias, fine-tuning; 2) learning biases: including question types and modality biases in cross-lingual setups. The key results of our analysis are: 1. We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance, yielding +10 accuracy points over existing methods. 2. We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers, and identify question types that are the most difficult to improve on. 3. We provide an analysis of modality biases present in training data and models, revealing why zero-shot performance gaps remain for certain question types and languages.