MERA: A Comprehensive LLM Evaluation in Russian

Alena Fenogenova; Artem Chervyakov; Nikita Martynov; Anastasia Kozlova; Maria Tikhonova; Albina Akhmetgareeva; Anton Emelyanov; Denis Shevelev; Pavel Lebedev; Leonid Sinev; Ulyana Isaeva; Katerina Kolomeytseva; Daniil Moskovskiy; Elizaveta Goncharova; Nikita Savushkin; Polina Mikhailova; Anastasia Minaeva; Denis Dimitrov; Alexander Panchenko; Sergey Markov

doi:10.18653/v1/2024.acl-long.534

MERA: A Comprehensive LLM Evaluation in Russian

Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Anastasia Minaeva, Denis Dimitrov, Alexander Panchenko, Sergey Markov

Abstract

Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). However, despite researchers’ attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce a new instruction benchmark, MERA, oriented towards the FMs’ performance on the Russian language. The benchmark encompasses 21 evaluation tasks for generative models covering 10 skills and is supplied with private answer scoring to prevent data leakage. The paper introduces a methodology to evaluate FMs and LMs in fixed zero- and few-shot instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential ethical concerns and drawbacks.

Anthology ID:: 2024.acl-long.534
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9920–9948
Language:
URL:: https://aclanthology.org/2024.acl-long.534/
DOI:: 10.18653/v1/2024.acl-long.534
Bibkey:
Cite (ACL):: Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Anastasia Minaeva, Denis Dimitrov, Alexander Panchenko, and Sergey Markov. 2024. MERA: A Comprehensive LLM Evaluation in Russian. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9920–9948, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: MERA: A Comprehensive LLM Evaluation in Russian (Fenogenova et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-long.534.pdf

PDF Cite Search Fix data