Wenyu Zhang
Cornell
Unverified author pages with similar names: Wenyu Zhang
2025
SPHERE: Unveiling Spatial Blind Spots in Vision-Language Models Through Hierarchical Evaluation
Wenyu Zhang | Wei En Ng | Lixin Ma | Yuwen Wang | Junqi Zhao | Allison Koenecke | Boyang Li | Lu Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Wenyu Zhang | Wei En Ng | Lixin Ma | Yuwen Wang | Junqi Zhao | Allison Koenecke | Boyang Li | Lu Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Current vision-language models may grasp basic spatial cues and simple directions (e.g. left, right, front, back), but struggle with the multi-dimensional spatial reasoning necessary for human-like understanding and real-world applications. To address this gap, we develop SPHERE (Spatial Perception and Hierarchical Evaluation of REasoning), a hierarchical evaluation framework supported by a new human-annotated dataset. SPHERE systematically probes models across increasing levels of complexity, from fundamental skills to multi-skill integration and high-level reasoning that combines spatial, visual, and logical understanding. Benchmark evaluation of state-of-the-art models reveals significant deficiencies, especially in reasoning about distance and proximity, understanding both egocentric and allocentric perspectives, and applying spatial logic in physical contexts. These findings expose critical blind spots in existing models and underscore the need for more advanced spatial reasoning techniques, driving the development of vision-language models that align more closely with human spatial cognition.
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang | Xunlong Zou | Geyu Lin | Shuo Sun | Zhuohan Liu | Wenyu Zhang | Zhengyuan Liu | AiTi Aw | Nancy F. Chen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Bin Wang | Xunlong Zou | Geyu Lin | Shuo Sun | Zhuohan Liu | Wenyu Zhang | Zhengyuan Liu | AiTi Aw | Nancy F. Chen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
We introduce AudioBench, a universal benchmark designed to evaluate Audio Large Language Models (AudioLLMs). It encompasses 8 distinct tasks and 26 datasets, among which, 7 are newly proposed datasets. The evaluation targets three main aspects: speech understanding, audio scene understanding, and voice understanding (paralinguistic). Despite recent advancements, there lacks a comprehensive benchmark for AudioLLMs on instruction following capabilities conditioned on audio signals. AudioBench addresses this gap by setting up datasets as well as desired evaluation metrics. Besides, we also evaluated the capabilities of five popular models and found that no single model excels consistently across all tasks. We outline the research outlook for AudioLLMs and anticipate that our open-sourced evaluation toolkit, data, and leaderboard will offer a robust testbed for future model developments.
2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia | Rahmad Mahendra | Salsabil Maulana Akbar | Lester James V. Miranda | Jennifer Santoso | Elyanah Aco | Akhdan Fadhilah | Jonibek Mansurov | Joseph Marvin Imperial | Onno P. Kampman | Joel Ruben Antony Moniz | Muhammad Ravi Shulthan Habibi | Frederikus Hudi | Railey Montalan | Ryan Ignatius | Joanito Agili Lopo | William Nixon | Börje F. Karlsson | James Jaya | Ryandito Diandaru | Yuze Gao | Patrick Amadeus | Bin Wang | Jan Christian Blaise Cruz | Chenxi Whitehouse | Ivan Halim Parmonangan | Maria Khelli | Wenyu Zhang | Lucky Susanto | Reynard Adha Ryanda | Sonny Lazuardi Hermawan | Dan John Velasco | Muhammad Dehan Al Kautsar | Willy Fitra Hendria | Yasmin Moslem | Noah Flynn | Muhammad Farid Adilazuarda | Haochen Li | Johanes Lee | R. Damanhuri | Shuo Sun | Muhammad Reza Qorib | Amirbek Djanibekov | Wei Qi Leong | Quyet V. Do | Niklas Muennighoff | Tanrada Pansuwan | Ilham Firdausi Putra | Yan Xu | Tai Ngee Chia | Ayu Purwarianti | Sebastian Ruder | William Tjhi | Peerat Limkonchotiwat | Alham Fikri Aji | Sedrick Keh | Genta Indra Winata | Ruochen Zhang | Fajri Koto | Zheng-Xin Yong | Samuel Cahyawijaya
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Holy Lovenia | Rahmad Mahendra | Salsabil Maulana Akbar | Lester James V. Miranda | Jennifer Santoso | Elyanah Aco | Akhdan Fadhilah | Jonibek Mansurov | Joseph Marvin Imperial | Onno P. Kampman | Joel Ruben Antony Moniz | Muhammad Ravi Shulthan Habibi | Frederikus Hudi | Railey Montalan | Ryan Ignatius | Joanito Agili Lopo | William Nixon | Börje F. Karlsson | James Jaya | Ryandito Diandaru | Yuze Gao | Patrick Amadeus | Bin Wang | Jan Christian Blaise Cruz | Chenxi Whitehouse | Ivan Halim Parmonangan | Maria Khelli | Wenyu Zhang | Lucky Susanto | Reynard Adha Ryanda | Sonny Lazuardi Hermawan | Dan John Velasco | Muhammad Dehan Al Kautsar | Willy Fitra Hendria | Yasmin Moslem | Noah Flynn | Muhammad Farid Adilazuarda | Haochen Li | Johanes Lee | R. Damanhuri | Shuo Sun | Muhammad Reza Qorib | Amirbek Djanibekov | Wei Qi Leong | Quyet V. Do | Niklas Muennighoff | Tanrada Pansuwan | Ilham Firdausi Putra | Yan Xu | Tai Ngee Chia | Ayu Purwarianti | Sebastian Ruder | William Tjhi | Peerat Limkonchotiwat | Alham Fikri Aji | Sedrick Keh | Genta Indra Winata | Ruochen Zhang | Fajri Koto | Zheng-Xin Yong | Samuel Cahyawijaya
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, through a collaborative movement, we introduce SEACrowd, a comprehensive resource center that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. Through our SEACrowd benchmarks, we assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA. Furthermore, we propose strategies to facilitate greater AI advancements, maximizing potential utility and resource equity for the future of AI in Southeast Asia.
Search
Fix author
Co-authors
- Shuo Sun 2
- Bin Wang 2
- Elyanah Aco 1
- Muhammad Farid Adilazuarda 1
- Alham Fikri Aji 1
- Salsabil Maulana Akbar 1
- Muhammad Dehan Al Kautsar 1
- Patrick Amadeus 1
- Aiti Aw 1
- Samuel Cahyawijaya 1
- Nancy Chen 1
- Tai Ngee Chia 1
- Jan Christian Blaise Cruz 1
- R. Damanhuri 1
- Ryandito Diandaru 1
- Amirbek Djanibekov 1
- Quyet V. Do 1
- Akhdan Fadhilah 1
- Noah Flynn 1
- Yuze Gao 1
- Muhammad Ravi Shulthan Habibi 1
- Willy Fitra Hendria 1
- Sonny Lazuardi Hermawan 1
- Frederikus Hudi 1
- Ryan Ignatius 1
- Joseph Marvin Imperial 1
- James Jaya 1
- Onno P. Kampman 1
- Börje F. Karlsson 1
- Sedrick Keh 1
- Maria Khelli 1
- Allison Koenecke 1
- Fajri Koto 1
- Johanes Lee 1
- Wei Qi Leong 1
- Haochen Li 1
- Boyang Li 1
- Peerat Limkonchotiwat 1
- Geyu Lin 1
- Zhuohan Liu 1
- Zhengyuan Liu 1
- Joanito Agili Lopo 1
- Holy Lovenia 1
- Lixin Ma 1
- Rahmad Mahendra 1
- Jonibek Mansurov 1
- Lester James Validad Miranda 1
- Joel Ruben Antony Moniz 1
- Jann Railey Montalan 1
- Yasmin Moslem 1
- Niklas Muennighoff 1
- Wei En Ng 1
- William Nixon 1
- Tanrada Pansuwan 1
- Ivan Halim Parmonangan 1
- Ayu Purwarianti 1
- Ilham Firdausi Putra 1
- Muhammad Reza Qorib 1
- Sebastian Ruder 1
- Reynard Adha Ryanda 1
- Jennifer Santoso 1
- Lucky Susanto 1
- William Tjhi 1
- Dan John Velasco 1
- Yuwen Wang 1
- Lu Wang 1
- Chenxi Whitehouse 1
- Genta Indra Winata 1
- Yan Xu 1
- Zheng Xin Yong 1
- Ruochen Zhang 1
- Junqi Zhao 1
- Xunlong Zou 1