Anar Rzayev
2025
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
Jafar Isbarov | Arofat Akhundjanova | Mammad Hajili | Kavsar Huseynova | Dmitry Gaynullin | Anar Rzayev | Osman Tursun | Aizirek Turdubaeva | Ilshat Saetov | Rinat Kharisov | Saule Belginova | Ariana Kenbayeva | Amina Alisheva | Abdullatif Köksal | Samir Rustamov | Duygu Ataman
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jafar Isbarov | Arofat Akhundjanova | Mammad Hajili | Kavsar Huseynova | Dmitry Gaynullin | Anar Rzayev | Osman Tursun | Aizirek Turdubaeva | Ilshat Saetov | Rinat Kharisov | Saule Belginova | Ariana Kenbayeva | Amina Alisheva | Abdullatif Köksal | Samir Rustamov | Duygu Ataman
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Being able to thoroughly assess massive multi-task language understanding (MMLU) capabilities is essential for advancing the applicability of multilingual language models. However, preparing such benchmarks in high quality native language is often costly and therefore limits the representativeness of evaluation datasets. While recent efforts focused on building more inclusive MMLU benchmarks, these are conventionally built using machine translation from high-resource languages, which may introduce errors and fail to account for the linguistic and cultural intricacies of the target languages. In this paper, we address the lack of native language MMLU benchmark especially in the under-represented Turkic language family with distinct morphosyntactic and cultural characteristics. We propose two benchmarks for Turkic language MMLU: TUMLU is a comprehensive, multilingual, and natively developed language understanding benchmark specifically designed for Turkic languages. It consists of middle- and high-school level questions spanning 11 academic subjects in Azerbaijani, Crimean Tatar, Karakalpak, Kazakh, Kyrgyz, Tatar, Turkish, Uyghur, and Uzbek. We also present TUMLU-mini, a more concise, balanced, and manually verified subset of the dataset. Using this dataset, we systematically evaluate a diverse range of open and proprietary multilingual large language models (LLMs), including Claude, Gemini, GPT, and LLaMA, offering an in-depth analysis of their performance across different languages, subjects, and alphabets. To promote further research and development in multilingual language understanding, we release TUMLU-mini and all corresponding evaluation scripts.
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra Winata | Frederikus Hudi | Patrick Amadeus Irawan | David Anugraha | Rifki Afina Putri | Wang Yutong | Adam Nohejl | Ubaidillah Ariq Prathama | Nedjma Ousidhoum | Afifa Amriani | Anar Rzayev | Anirban Das | Ashmari Pramodya | Aulia Adila | Bryan Wilie | Candy Olivia Mawalim | Cheng Ching Lam | Daud Abolade | Emmanuele Chersoni | Enrico Santus | Fariz Ikhwantri | Garry Kuwanto | Hanyang Zhao | Haryo Akbarianto Wibowo | Holy Lovenia | Jan Christian Blaise Cruz | Jan Wira Gotama Putra | Junho Myung | Lucky Susanto | Maria Angelica Riera Machin | Marina Zhukova | Michael Anugraha | Muhammad Farid Adilazuarda | Natasha Christabelle Santosa | Peerat Limkonchotiwat | Raj Dabre | Rio Alexander Audino | Samuel Cahyawijaya | Shi-Xiong Zhang | Stephanie Yulia Salim | Yi Zhou | Yinxuan Gui | David Ifeoluwa Adelani | En-Shiun Annie Lee | Shogo Okada | Ayu Purwarianti | Alham Fikri Aji | Taro Watanabe | Derry Tanti Wijaya | Alice Oh | Chong-Wah Ngo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Genta Indra Winata | Frederikus Hudi | Patrick Amadeus Irawan | David Anugraha | Rifki Afina Putri | Wang Yutong | Adam Nohejl | Ubaidillah Ariq Prathama | Nedjma Ousidhoum | Afifa Amriani | Anar Rzayev | Anirban Das | Ashmari Pramodya | Aulia Adila | Bryan Wilie | Candy Olivia Mawalim | Cheng Ching Lam | Daud Abolade | Emmanuele Chersoni | Enrico Santus | Fariz Ikhwantri | Garry Kuwanto | Hanyang Zhao | Haryo Akbarianto Wibowo | Holy Lovenia | Jan Christian Blaise Cruz | Jan Wira Gotama Putra | Junho Myung | Lucky Susanto | Maria Angelica Riera Machin | Marina Zhukova | Michael Anugraha | Muhammad Farid Adilazuarda | Natasha Christabelle Santosa | Peerat Limkonchotiwat | Raj Dabre | Rio Alexander Audino | Samuel Cahyawijaya | Shi-Xiong Zhang | Stephanie Yulia Salim | Yi Zhou | Yinxuan Gui | David Ifeoluwa Adelani | En-Shiun Annie Lee | Shogo Okada | Ayu Purwarianti | Alham Fikri Aji | Taro Watanabe | Derry Tanti Wijaya | Alice Oh | Chong-Wah Ngo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date. It includes tasks for identifying dish names and their origins. We provide evaluation datasets in two sizes (12k and 60k instances) alongside a training dataset (1 million instances). Our findings show that while VLMs perform better with correct location context, they struggle with adversarial contexts and predicting specific regional cuisines and languages. To support future research, we release a knowledge base with annotated food entries and images along with the VQA data.
2024
Findings of the 2nd Shared Task on Multi-lingual Multi-task Information Retrieval at MRL 2024
Francesco Tinner | Raghav Mantri | Mammad Hajili | Chiamaka Chukwuneke | Dylan Massey | Benjamin A. Ajibade | Bilge Deniz Kocak | Abolade Dawud | Jonathan Atala | Hale Sirin | Kayode Olaleye | Anar Rzayev | Jafar Isbarov | Dursun Dashdamirov | David Adelani | Duygu Ataman
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)
Francesco Tinner | Raghav Mantri | Mammad Hajili | Chiamaka Chukwuneke | Dylan Massey | Benjamin A. Ajibade | Bilge Deniz Kocak | Abolade Dawud | Jonathan Atala | Hale Sirin | Kayode Olaleye | Anar Rzayev | Jafar Isbarov | Dursun Dashdamirov | David Adelani | Duygu Ataman
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)
Large language models (LLMs) demonstrate exceptional proficiency in both the comprehension and generation of textual data, particularly in English, a language for which extensive public benchmarks have been established across a wide range of natural language processing (NLP) tasks. Nonetheless, their performance in multilingual contexts and specialized domains remains less rigorously validated, raising questions about their reliability and generalizability across linguistically diverse and domain-specific settings. The second edition of the Shared Task on Multilingual Multitask Information Retrieval aims to provide a comprehensive and inclusive multilingual evaluation benchmark which aids assessing the ability of multilingual LLMs to capture logical, factual, or causal relationships within lengthy text contexts and generate language under sparse settings, particularly in scenarios with under-resourced languages. The shared task consists of two subtasks crucial to information retrieval: Named entity recognition (NER) and reading comprehension (RC), in 7 data-scarce languages: Azerbaijani, Swiss German, Turkish and , which previously lacked annotated resources in information retrieval tasks. This year specifally focus on the multiple-choice question answering evaluation setting which provides a more objective setting for comparing different methods across languages.
Search
Fix author
Co-authors
- David Ifeoluwa Adelani 2
- Duygu Ataman 2
- Mammad Hajili 2
- Jafar Isbarov 2
- Daud Abolade 1
- Aulia Adila 1
- Muhammad Farid Adilazuarda 1
- Alham Fikri Aji 1
- Benjamin A. Ajibade 1
- Arofat Akhundjanova 1
- Amina Alisheva 1
- Afifa Amriani 1
- David Anugraha 1
- Michael Anugraha 1
- Jonathan Atala 1
- Rio Alexander Audino 1
- Saule Belginova 1
- Samuel Cahyawijaya 1
- Emmanuele Chersoni 1
- Chiamaka Chukwuneke 1
- Jan Christian Blaise Cruz 1
- Raj Dabre 1
- Anirban Das 1
- Dursun Dashdamirov 1
- Abolade Dawud 1
- Dmitry Gaynullin 1
- Yinxuan Gui 1
- Frederikus Hudi 1
- Kavsar Huseynova 1
- Fariz Ikhwantri 1
- Patrick Amadeus Irawan 1
- Ariana Kenbayeva 1
- Rinat Kharisov 1
- Bilge Deniz Kocak 1
- Garry Kuwanto 1
- Abdullatif Köksal 1
- Cheng Ching Lam 1
- En-Shiun Annie Lee 1
- Peerat Limkonchotiwat 1
- Holy Lovenia 1
- Raghav Mantri 1
- Dylan Massey 1
- Candy Olivia Mawalim 1
- Junho Myung 1
- Chong-Wah Ngo 1
- Adam Nohejl 1
- Alice Oh 1
- Shogo Okada 1
- Kayode Olaleye 1
- Nedjma Ousidhoum 1
- Ashmari Pramodya 1
- Ubaidillah Ariq Prathama 1
- Ayu Purwarianti 1
- Jan Wira Gotama Putra 1
- Rifki Afina Putri 1
- Maria Angelica Riera Machin 1
- Samir Rustamov 1
- Ilshat Saetov 1
- Stephanie Yulia Salim 1
- Natasha Christabelle Santosa 1
- Enrico Santus 1
- Hale Sirin 1
- Lucky Susanto 1
- Francesco Tinner 1
- Aizirek Turdubaeva 1
- Osman Tursun 1
- Taro Watanabe 1
- Haryo Akbarianto Wibowo 1
- Derry Tanti Wijaya 1
- Bryan Wilie 1
- Genta Indra Winata 1
- Wang Yutong 1
- Shi-Xiong Zhang 1
- Hanyang Zhao 1
- Yi Zhou 1
- Marina Zhukova 1