Weihua Zheng

2026

BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models
Bryan Chen Zhengyu Tan | Weihua Zheng | Zhengyuan Liu | Nancy F. Chen | Hwaran Lee | Kenny Tsu Wei Choo | Roy Ka-Wei Lee
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

As vision-language models (VLMs) are deployed globally, their ability to understand culturally situated knowledge becomes essential. Yet, existing evaluations largely assess static recall or isolated visual grounding, leaving unanswered whether VLMs possess robust and transferable cultural understanding. We introduce ‘BLEnD-Vis‘, a multimodal, multicultural benchmark designed to evaluate the robustness of everyday cultural knowledge in VLMs across linguistic rephrasings and visual modalities. Building on the BLEnD dataset, ‘BLEnD-Vis‘ constructs 313 culturally grounded question templates spanning 16 regions and generates three aligned multiple-choice formats: (i) a text-only baseline querying from Region → Entity, (ii) an inverted text-only variant (Entity → Region), and (iii) a VQA-style version of (ii) with generated images. The resulting benchmark comprises 4,916 images and over 21,000 multiple-choice questions (MCQ) instances, validated through human annotation. ‘BLEnD-Vis‘ reveals significant fragility in current VLM cultural knowledge; models exhibit performance drops under linguistic rephrasing. While visual cues often aid performance, low cross-modal consistency highlights the challenges of robustly integrating textual and visual understanding, particularly in lower-resource regions. ‘BLEnD-Vis‘ thus provides a crucial testbed for systematically analysing cultural robustness and multimodal grounding, exposing limitations and guiding the development of more culturally competent VLMs. Code is available at https://github.com/Social-AI-Studio/BLEnD-Vis.

2024

pdf bib abs

Evaluating Code-Switching Translation with Large Language Models
Muhammad Huzaifah | Weihua Zheng | Nattapol Chanpaisit | Kui Wu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Recent advances in large language models (LLMs) have shown they can match or surpass finetuned models on many natural language processing tasks. Currently, more studies are being carried out to assess whether this performance carries over across different languages. In this paper, we present a thorough evaluation of LLMs for the less well-researched code-switching translation setting, where inputs include a mixture of different languages. We benchmark the performance of six state-of-the-art LLMs across seven datasets, with GPT-4 and GPT-3.5 displaying strong ability relative to supervised translation models and commercial engines. GPT-4 was also found to be particularly robust against different code-switching conditions. Several methods to further improve code-switching translation are proposed including leveraging in-context learning and pivot translation. Through our code-switching experiments, we argue that LLMs show promising ability for cross-lingual understanding.

Co-authors

Roy Ka-Wei Lee 1

Zhengyuan Liu 1

Bryan Chen Zhengyu Tan 1

Kui Wu 1

Venues

Fix author