Svetlana Toldova


2024

pdf bib
Of Models and Men: Probing Neural Networks for Agreement Attraction with Psycholinguistic Data
Maxim Bazhukov | Ekaterina Voloshina | Sergey Pletenev | Arseny Anisimov | Oleg Serikov | Svetlana Toldova
Proceedings of the 28th Conference on Computational Natural Language Learning

Interpretability studies have played an important role in the field of NLP. They focus on the problems of how models encode information or, for instance, whether linguistic capabilities allow them to prefer grammatical sentences to ungrammatical. Recently, several studies examined whether the models demonstrate patterns similar to humans and whether they are sensitive to the phenomena of interference like humans’ grammaticality judgements, including the phenomenon of agreement attraction.In this paper, we probe BERT and GPT models on the syntactic phenomenon of agreement attraction in Russian using the psycholinguistic data with syncretism. Working on the language with syncretism between some plural and singular forms allows us to differentiate between the effects of the surface form and of the underlying grammatical feature. Thus we can further investigate models’ sensitivity to this phenomenon and examine if the patterns of their behaviour are similar to human patterns. Moreover, we suggest a new way of comparing models’ and humans’ responses via statistical testing. We show that there are some similarities between models’ and humans’ results, while GPT is somewhat more aligned with human responses than BERT. Finally, preliminary results suggest that surface form syncretism influences attraction, perhaps more so than grammatical form syncretism.

2020

pdf bib
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
Ekaterina Vylomova | Jennifer White | Elizabeth Salesky | Sabrina J. Mielke | Shijie Wu | Edoardo Maria Ponti | Rowan Hall Maudslay | Ran Zmigrod | Josef Valvoda | Svetlana Toldova | Francis Tyers | Elena Klyachko | Ilya Yegorov | Natalia Krizhanovsky | Paula Czarnowska | Irene Nikkarinen | Andrew Krizhanovsky | Tiago Pimentel | Lucas Torroba Hennigen | Christo Kirov | Garrett Nicolai | Adina Williams | Antonios Anastasopoulos | Hilaria Cruz | Eleanor Chodroff | Ryan Cotterell | Miikka Silfverberg | Mans Hulden
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems’ ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages. Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90% mean accuracy while others were more challenging.

2019

pdf bib
Towards the Data-driven System for Rhetorical Parsing of Russian Texts
Elena Chistova | Maria Kobozeva | Dina Pisarevskaya | Artem Shelmanov | Ivan Smirnov | Svetlana Toldova
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

Results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank – first Russian corpus annotated within RST framework – are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 ± 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.

2017

pdf bib
Rhetorical relations markers in Russian RST Treebank
Svetlana Toldova | Dina Pisarevskaya | Margarita Ananyeva | Maria Kobozeva | Alexander Nasedkin | Sofia Nikiforova | Irina Pavlova | Alexey Shelepov
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

2016

pdf bib
Error analysis for anaphora resolution in Russian: new challenging issues for anaphora resolution task in a morphologically rich language
Svetlana Toldova | Ilya Azerkovich | Alina Ladygina | Anna Roitberg | Maria Vasilyeva
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

2013

pdf bib
Learning Computational Linguistics through NLP Evaluation Events: the experience of Russian evaluation initiative
Anastasia Bonch-Osmolovskaya | Svetlana Toldova | Olga Lyashevskaya
Proceedings of the Fourth Workshop on Teaching NLP and CL

2012

pdf bib
RU-EVAL-2012: Evaluating Dependency Parsers for Russian
Anastasia Gareyshina | Maxim Ionov | Olga Lyashevskaya | Dmitry Privoznov | Elena Sokolova | Svetlana Toldova
Proceedings of COLING 2012: Posters