Anton Emelyanov
2026
Multimodal Evaluation of Russian-language Architectures
Artem Chervyakov | Ulyana Isaeva | Anton Emelyanov | Artem Safin | Maria Tikhonova | Alexander Kharitonov | Yulia Lyakh | Petr Surovtsev | Denis Shevelev | Vildan Saburov | Vasily Konovalov | Elisei Rykov | Ivan Sviridov | Amina Miftakhova | Ilseyar Alimova | Alexander Panchenko | Alexander Kapitanov | Alena Fenogenova
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Artem Chervyakov | Ulyana Isaeva | Anton Emelyanov | Artem Safin | Maria Tikhonova | Alexander Kharitonov | Yulia Lyakh | Petr Surovtsev | Denis Shevelev | Vildan Saburov | Vasily Konovalov | Elisei Rykov | Ivan Sviridov | Amina Miftakhova | Ilseyar Alimova | Alexander Panchenko | Alexander Kapitanov | Alena Fenogenova
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal large language models (MLLMs) are currently at the center of research attention, showing rapid progress in scale and capabilities, yet their intelligence, limitations, and risks remain insufficiently understood. To address these issues, particularly in the context of the Russian language, where no multimodal benchmarks currently exist, we introduce MERA Multi, an open multimodal evaluation framework for Russian-spoken architectures. The benchmark is instruction-based and encompasses default text, image, audio, and video modalities, comprising 18 newly constructed evaluation tasks for both general-purpose models and modality-specific architectures (image-to-text, video-to-text, and audio-to-text). Our contributions include: (i) a universal taxonomy of multimodal abilities; (ii) 18 datasets created entirely from scratch with attention to Russian cultural and linguistic specificity, unified prompts, and metrics; (iii) baseline results for both closed-source and open-source models; (iv) a methodology for preventing benchmark leakage, including watermarking for private sets. While our current focus is on Russian, the proposed benchmark provides a replicable methodology for constructing multimodal benchmarks in typologically diverse languages, particularly within the Slavic language family.
FiMMIA: scaling semantic perturbation-based membership inference across modalities
Anton Emelyanov | Sergei Kudriashov | Alena Fenogenova
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Anton Emelyanov | Sergei Kudriashov | Alena Fenogenova
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Membership Inference Attacks (MIAs) aim to determine whether a specific data point was included in the training set of a target model. Although there are have been numerous methods developed for detecting data contamination in large language models (LLMs), their performance on multimodal LLMs (MLLMs) falls short due to the instabilities introduced through multimodal component adaptation and possible distribution shifts across multiple inputs. In this work, we investigate multimodal membership inference and address two issues: first, by identifying distribution shifts in the existing datasets, and second, by releasing an extended baseline pipeline to detect them. We also generalize the perturbation-based membership inference methods to MLLMs and release FiMMIA — a modular Framework for Multimodal MIA. We propose to train a neural networks to analyze the target model’s behavior on perturbed inputs, capturing interactions between semantic domains and loss values on members and non-members in the local neighborhood of each sample. Comprehensive evaluations on various fine-tuned multimodal models demonstrate the effectiveness of our perturbation-based membership inference attacks in multimodal settings.
2024
MERA: A Comprehensive LLM Evaluation in Russian
Alena Fenogenova | Artem Chervyakov | Nikita Martynov | Anastasia Kozlova | Maria Tikhonova | Albina Akhmetgareeva | Anton Emelyanov | Denis Shevelev | Pavel Lebedev | Leonid Sinev | Ulyana Isaeva | Katerina Kolomeytseva | Daniil Moskovskiy | Elizaveta Goncharova | Nikita Savushkin | Polina Mikhailova | Anastasia Minaeva | Denis Dimitrov | Alexander Panchenko | Sergey Markov
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Alena Fenogenova | Artem Chervyakov | Nikita Martynov | Anastasia Kozlova | Maria Tikhonova | Albina Akhmetgareeva | Anton Emelyanov | Denis Shevelev | Pavel Lebedev | Leonid Sinev | Ulyana Isaeva | Katerina Kolomeytseva | Daniil Moskovskiy | Elizaveta Goncharova | Nikita Savushkin | Polina Mikhailova | Anastasia Minaeva | Denis Dimitrov | Alexander Panchenko | Sergey Markov
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). However, despite researchers’ attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce a new instruction benchmark, MERA, oriented towards the FMs’ performance on the Russian language. The benchmark encompasses 21 evaluation tasks for generative models covering 10 skills and is supplied with private answer scoring to prevent data leakage. The paper introduces a methodology to evaluate FMs and LMs in fixed zero- and few-shot instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential ethical concerns and drawbacks.
2020
Humans Keep It One Hundred: an Overview of AI Journey
Tatiana Shavrina | Anton Emelyanov | Alena Fenogenova | Vadim Fomin | Vladislav Mikhailov | Andrey Evlampiev | Valentin Malykh | Vladimir Larin | Alex Natekin | Aleksandr Vatulin | Peter Romov | Daniil Anastasiev | Nikolai Zinov | Andrey Chertok
Proceedings of the Twelfth Language Resources and Evaluation Conference
Tatiana Shavrina | Anton Emelyanov | Alena Fenogenova | Vadim Fomin | Vladislav Mikhailov | Andrey Evlampiev | Valentin Malykh | Vladimir Larin | Alex Natekin | Aleksandr Vatulin | Peter Romov | Daniil Anastasiev | Nikolai Zinov | Andrey Chertok
Proceedings of the Twelfth Language Resources and Evaluation Conference
Artificial General Intelligence (AGI) is showing growing performance in numerous applications - beating human performance in Chess and Go, using knowledge bases and text sources to answer questions (SQuAD) and even pass human examination (Aristo project). In this paper, we describe the results of AI Journey, a competition of AI-systems aimed to improve AI performance on knowledge bases, reasoning and text generation. Competing systems pass the final native language exam (in Russian), including versatile grammar tasks (test and open questions) and an essay, achieving a high score of 69%, with 68% being an average human result. During the competition, a baseline for the task and essay parts was proposed, and 80+ systems were submitted, showing different approaches to task understanding and reasoning. All the data and solutions can be found on github https://github.com/sberbank-ai/combined_solution_aij2019
2019
Multilingual Named Entity Recognition Using Pretrained Embeddings, Attention Mechanism and NCRF
Anton Emelyanov | Ekaterina Artemova
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
Anton Emelyanov | Ekaterina Artemova
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
In this paper we tackle multilingual named entity recognition task. We use the BERT Language Model as embeddings with bidirectional recurrent network, attention, and NCRF on the top. We apply multilingual BERT only as embedder without any fine-tuning. We test out model on the dataset of the BSNLP shared task, which consists of texts in Bulgarian, Czech, Polish and Russian languages.
Search
Fix author
Co-authors
- Alena Fenogenova 4
- Artem Chervyakov 2
- Ulyana Isaeva 2
- Alexander Panchenko 2
- Denis Shevelev 2
- Maria Tikhonova 2
- Albina Akhmetgareeva 1
- Ilseyar Alimova 1
- Daniil Anastasiev 1
- Ekaterina Artemova 1
- Andrey Chertok 1
- Denis Dimitrov 1
- Andrey Evlampiev 1
- Vadim Fomin 1
- Elizaveta Goncharova 1
- Alexander Kapitanov 1
- Alexander Kharitonov 1
- Katerina Kolomeytseva 1
- Vasily Konovalov 1
- Anastasia Kozlova 1
- Sergei Kudriashov 1
- Vladimir Larin 1
- Pavel Lebedev 1
- Yulia Lyakh 1
- Valentin Malykh 1
- Sergey Markov 1
- Nikita Martynov 1
- Amina Miftakhova 1
- Vladislav Mikhailov 1
- Polina Mikhailova 1
- Anastasia Minaeva 1
- Daniil Moskovskiy 1
- Alex Natekin 1
- Peter Romov 1
- Elisei Rykov 1
- Vildan Saburov 1
- Artem Safin 1
- Nikita Savushkin 1
- Tatiana Shavrina 1
- Leonid Sinev 1
- Petr Surovtsev 1
- Ivan Sviridov 1
- Aleksandr Vatulin 1
- Nikolai Zinov 1