Alena Fenogenova


pdf bib
A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification
Varvara Logacheva | Daryna Dementieva | Irina Krotova | Alena Fenogenova | Irina Nikishina | Tatiana Shavrina | Alexander Panchenko
Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval)

It is often difficult to reliably evaluate models which generate text. Among them, text style transfer is a particularly difficult to evaluate, because its success depends on a number of parameters.We conduct an evaluation of a large number of models on a detoxification task. We explore the relations between the manual and automatic metrics and find that there is only weak correlation between them, which is dependent on the type of model which generated text. Automatic metrics tend to be less reliable for better-performing models. However, our findings suggest that, ChrF and BertScore metrics can be used as a proxy for human evaluation of text detoxification to some extent.


pdf bib
Russian Paraphrasers: Paraphrase with Transformers
Alena Fenogenova
Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing

This paper studies the generation methods for paraphrasing in the Russian language. There are several transformer-based models (Russian and multilingual) trained on a collected corpus of paraphrases. We compare different models, contrast the quality of paraphrases using different ranking methods and apply paraphrasing methods in the context of augmentation procedure for different tasks. The contributions of the work are the combined paraphrasing dataset, fine-tuned generated models for Russian paraphrasing task and additionally the open source tool for simple usage of the paraphrasers.


pdf bib
Read and Reason with MuSeRC and RuCoS: Datasets for Machine Reading Comprehension for Russian
Alena Fenogenova | Vladislav Mikhailov | Denis Shevelev
Proceedings of the 28th International Conference on Computational Linguistics

The paper introduces two Russian machine reading comprehension (MRC) datasets, called MuSeRC and RuCoS, which require reasoning over multiple sentences and commonsense knowledge to infer the answer. The former follows the design of MultiRC, while the latter is a counterpart of the ReCoRD dataset. The datasets are included in RussianSuperGLUE, the Russian general language understanding benchmark. We provide a comparative analysis and demonstrate that the proposed tasks are relatively more complex as compared to the original ones for English. Besides, performance results of human solvers and BERT-based models show that MuSeRC and RuCoS represent a challenge for recent advanced neural models. We thus hope to facilitate research in the field of MRC for Russian and prompt the study of multi-hop reasoning in a cross-lingual scenario.

pdf bib
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
Tatiana Shavrina | Alena Fenogenova | Emelyanov Anton | Denis Shevelev | Ekaterina Artemova | Valentin Malykh | Vladislav Mikhailov | Maria Tikhonova | Andrey Chertok | Andrey Evlampiev
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we introduce an advanced Russian general language understanding evaluation benchmark – Russian SuperGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for their broad diagnostics and testing for general intellectual skills - detection of natural language inference, commonsense reasoning, ability to perform simple logical operations regardless of text subject or lexicon. For the first time, a benchmark of nine tasks, collected and organized analogically to the SuperGLUE methodology, was developed from scratch for the Russian language. We also provide baselines, human level evaluation, open-source framework for evaluating models, and an overall leaderboard of transformer models for the Russian language. Besides, we present the first results of comparing multilingual models in the translated diagnostic test set and offer the first steps to further expanding or assessing State-of-the-art models independently of language.

pdf bib
Humans Keep It One Hundred: an Overview of AI Journey
Tatiana Shavrina | Anton Emelyanov | Alena Fenogenova | Vadim Fomin | Vladislav Mikhailov | Andrey Evlampiev | Valentin Malykh | Vladimir Larin | Alex Natekin | Aleksandr Vatulin | Peter Romov | Daniil Anastasiev | Nikolai Zinov | Andrey Chertok
Proceedings of the 12th Language Resources and Evaluation Conference

Artificial General Intelligence (AGI) is showing growing performance in numerous applications - beating human performance in Chess and Go, using knowledge bases and text sources to answer questions (SQuAD) and even pass human examination (Aristo project). In this paper, we describe the results of AI Journey, a competition of AI-systems aimed to improve AI performance on knowledge bases, reasoning and text generation. Competing systems pass the final native language exam (in Russian), including versatile grammar tasks (test and open questions) and an essay, achieving a high score of 69%, with 68% being an average human result. During the competition, a baseline for the task and essay parts was proposed, and 80+ systems were submitted, showing different approaches to task understanding and reasoning. All the data and solutions can be found on github