Anisia Katinskaia - ACL Anthology

Anisia Katinskaia

2025

Estimation of Text Difficulty in the Context of Language Learning
Anisia Katinskaia | Anh-Duc Vu | Jue Hou | Ulla Vanhatalo | Yiheng Wu | Roman Yangarber
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Easy language and text simplification are currently topical research questions, with important applications in many contexts, and with various approaches under active investigation, including prompt-based methods. The estimation of the level of difficulty of a text becomes a crucial challenge when the estimator is employed in a simplification workflow as a quality-control mechanism. It can act as a critic in frameworks where it can guide other models, which are responsible for generating text at a specified level of difficulty, as determined by the user’s needs.We present our work in the context of simplified Finnish. We discuss problems in collecting corpora for training models for estimation of text difficulty, and our experiments with estimation models.The results of the experiments are promising: the models appear usable both for assessment and for deployment as a component in a larger simplification framework.

A Bayesian Approach to Inferring Prerequisite Structures and Topic Difficulty in Language Learning
Anh-Duc Vu | Jue Hou | Anisia Katinskaia | Ching-Fan Sheu | Roman Yangarber
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Understanding how linguistic topics are related to each another is essential for designing effective and adaptive second-language (L2) instruction. We present a data-driven framework to model topic dependencies and their difficulty within a L2 learning curriculum. First, we estimate topic difficulty and student ability using a three-parameter Item Response Theory (IRT) model. Second, we construct topic-level knowledge graphs—as directed acyclic graphs (DAGs)—to capture the prerequisite relations among the topics, comparing a threshold-based method with the statistical Grow-Shrink Markov Blanket algorithm. Third, we evaluate the alignment between IRT-inferred topic difficulty and the structure of the graphs using edge-level and global ordering metrics. Finally, we compare the IRT-based estimates of learner ability with assessments of the learners provided by teachers to validate the model’s effectiveness in capturing learner proficiency. Our results show a promising agreement between the inferred graphs, IRT estimates, and human teachers’ assessments, highlighting the framework’s potential to support personalized learning and adaptive curriculum design in intelligent tutoring systems.

Know-AI at TSAR 2025 Shared Task Difficulty-aware Text Simplification System
Yiheng Wu | Anisia Katinskaia | Jue Hou | Roman Yangarber
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

Text simplification is an active research topic with applications in multiple domains. In a simplification pipeline, assessment of text difficulty plays a crucial role as a quality control mechanism it acts as a critic and guides models to generate text at the difficulty level required by the user. This paper presents our Difficulty-aware Text Simplification System. We evaluate our pipeline using the TSAR shared task dataset and discuss challenges in constructing corpora for training models to assess text difficulty.

2024

Probing the Category of Verbal Aspect in Transformer Language Models
Anisia Katinskaia | Roman Yangarber
Findings of the Association for Computational Linguistics: NAACL 2024

We investigate how pretrained language models (PLM) encode the grammatical category of verbal aspect in Russian. Encoding of aspect in transformer LMs has not been studied previously in any language. A particular challenge is posed by ”alternative contexts”: where either the perfective or the imperfective aspect is suitable grammatically and semantically. We perform probing using BERT and RoBERTa on alternative and non-alternative contexts. First, we assess the models’ performance on aspect prediction, via behavioral probing. Next, we examine the models’ performance when their contextual representations are substituted with counterfactual representations, via causal probing. These counterfactuals alter the value of the “boundedness” feature—a semantic feature, which characterizes the action in the context. Experiments show that BERT and RoBERTa do encode aspect—mostly in their final layers. The counterfactual interventions affect perfective and imperfective in opposite ways, which is consistent with grammar: perfective is positively affected by adding the meaning of boundedness, and vice versa. The practical implications of our probing results are that fine-tuning only the last layers of BERT on predicting aspect is faster and more effective than fine-tuning the whole model. The model has high predictive uncertainty about aspect in alternative contexts, which tend to lack explicit hints about the boundedness of the described action.

What Do Transformers Know about Government?
Jue Hou | Anisia Katinskaia | Lari Kotilainen | Sathianpong Trangcasanchai | Anh-Duc Vu | Roman Yangarber
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper investigates what insights about linguistic features and what knowledge about the structure of natural language can be obtained from the encodings in transformer language models. In particular, we explore how BERT encodes the government relation between constituents in a sentence. We use several probing classifiers, and data from two morphologically rich languages. Our experiments show that information about government is encoded across all transformer layers, but predominantly in the early layers of the model. We find that, for both languages, a small number of attention heads encode enough information about the government relations to enable us to train a classifier capable of discovering new, previously unknown types of government, never seen in the training data. Currently, data is lacking for the research community working on grammatical constructions, and government in particular. We release the Government Bank—a dataset defining the government relations for thousands of lemmas in the languages in our experiments.

GPT-3.5 for Grammatical Error Correction
Anisia Katinskaia | Roman Yangarber
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper investigates the application of GPT-3.5 for Grammatical Error Correction (GEC) in multiple languages in several settings: zero-shot GEC, fine-tuning for GEC, and using GPT-3.5 to re-rank correction hypotheses generated by other GEC models. In the zero-shot setting, we conduct automatic evaluations of the corrections proposed by GPT-3.5 using several methods: estimating grammaticality with language models (LMs), the Scribendy test, and comparing the semantic embeddings of sentences. GPT-3.5 has a known tendency to over-correct erroneous sentences and propose alternative corrections. For several languages, such as Czech, German, Russian, Spanish, and Ukrainian, GPT-3.5 substantially alters the source sentences, including their semantics, which presents significant challenges for evaluation with reference-based metrics. For English, GPT-3.5 demonstrates high recall, generates fluent corrections, and generally preserves sentence semantics. However, human evaluation for both English and Russian reveals that, despite its strong error-detection capabilities, GPT-3.5 struggles with several error types, including punctuation mistakes, tense errors, syntactic dependencies between words, and lexical compatibility at the sentence level.

Intelligent Tutor to Support Teaching and Learning of Tatar
Alsu Zakirova | Jue Hou | Anisia Katinskaia | Anh-Duc Vu | Roman Yangarber
Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)

This paper presents our work on tools to support the Tatar language, using Revita, a web-based Intelligent Tutoring System for language teaching and learning. The system allows the users — teachers and learners — to upload arbitrary authentic texts, and automatically creates exercises based on these texts that engage the learners in active production of language. It provides graduated feedback when they make mistakes, and performs continuous assessment, based on which the system selects exercises for the learners at the appropriate level. The assessment also helps the students maintain their learning pace, and helps the teachers to monitor their progress.The paper describes the functionality currently implemented for Tatar, which enables learners — who possess basic proficiency beyond the beginner level — to improve their competency, using texts of their choice as learning content. Support for Tatar is being developed to increase public interest in learning the language of this important regional minority, as well as to to provide tools for improving fluency to “heritage speakers” — those who have substantial passive competency, but lack active fluency and need support for regular practice.

2023

Linguistic Constructs Represent the Domain Model in Intelligent Language Tutoring
Anisia Katinskaia | Jue Hou | Anh-duc Vu | Roman Yangarber
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

This paper presents the development of the AI-based language-learning platform, Revita. It is an intelligent online tutor, developed to support learners of multiple languages, from lower-intermediate toward advanced levels. It has been in pilot use with hundreds of students at several universities, whose feedback and needs shape the development. One of the main emerging features of Revita is the system of linguistic constructs to represent the domain knowledge. The system of constructs is developed in collaboration with experts in language pedagogy. Constructs define the types of exercises, the content of the feedback, and enable detailed modeling and evaluation of learner progress.

Grammatical Error Correction for Sentence-level Assessment in Language Learning
Anisia Katinskaia | Roman Yangarber
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

The paper presents experiments on using a Grammatical Error Correction (GEC) model to assess the correctness of answers that language learners give to grammar exercises. We explored whether a GEC model can be applied in the language learning context for a language with complex morphology. We empirically check a hypothesis that a GEC model corrects only errors and leaves correct answers unchanged. We perform a test on assessing learner answers in a real but constrained language-learning setup: the learners answer only fill-in-the-blank and multiple-choice exercises. For this purpose, we use ReLCo, a publicly available manually annotated learner dataset in Russian (Katinskaia et al., 2022). In this experiment, we fine-tune a large-scale T5 language model for the GEC task and estimate its performance on the RULEC-GEC dataset (Rozovskaya and Roth, 2019) to compare with top-performing models. We also release an updated version of the RULEC-GEC test set, manually checked by native speakers. Our analysis shows that the GEC model performs reasonably well in detecting erroneous answers to grammar exercises and potentially can be used for best-performing error types in a real learning setup. However, it struggles to assess answers which were tagged by human annotators as alternative-correct using the aforementioned hypothesis. This is in large part due to a still low recall in correcting errors, and the fact that the GEC model may modify even correct words—it may generate plausible alternatives, which are hard to evaluate against the gold-standard reference.

Effects of sub-word segmentation on performance of transformer language models
Jue Hou | Anisia Katinskaia | Anh-Duc Vu | Roman Yangarber
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Language modeling is a fundamental task in natural language processing, which has been thoroughly explored with various architectures and hyperparameters. However, few studies focus on the effect of sub-word segmentation on the performance of language models (LMs). In this paper, we compare GPT and BERT models trained with the statistical segmentation algorithm BPE vs. two unsupervised algorithms for morphological segmentation — Morfessor and StateMorph. We train the models for several languages — including ones with very rich morphology — and compare their performance with different segmentation algorithms, vocabulary sizes, and model sizes. The results show that training with morphological segmentation allows the LMs to: (1) achieve lower perplexity, (2) converge more efficiently in terms of training time, and (3) achieve equivalent or better evaluation scores on downstream tasks. Lastly, we show that (4) LMs of smaller size using morphological segmentation can perform comparably to models of larger size trained with BPE — both in terms of (1) perplexity and (3) scores on downstream tasks. Points (2) and (4) impact on sustainability, since they reduce the model cost; and while 2 reduces cost only in the training phase, 4 does so also in the inference phase.

2022

Applying Gamification Incentives in the Revita Language-learning System
Jue Hou | Ilmari Kylliäinen | Anisia Katinskaia | Giacomo Furlan | Roman Yangarber
Proceedings of the 9th Workshop on Games and Natural Language Processing within the 13th Language Resources and Evaluation Conference

We explore the importance of gamification features in a language-learning platform designed for intermediate-to-advanced learners. Our main thesis is: learning toward advanced levels requires a massive investment of time. If the learner engages in more practice sessions, and if the practice sessions are longer, we can expect the results to be better. This principle appears to be tautologically self-evident. Yet, keeping the learner engaged in general—and building gamification features in particular—requires substantial efforts on the part of developers. Our goal is to keep the learner engaged in long practice sessions over many months—rather than for the short-term. This creates a conflict: In academic research on language learning, resources are typically scarce, and gamification usually is not considered an essential priority for allocating resources. We argue in favor of giving serious consideration to gamification in the language-learning setting—as a means of enabling in-depth research. In this paper, we introduce several gamification incentives in the Revita language-learning platform. We discuss the problems in obtaining quantitative measures of the effectiveness of gamification features.

Semi-automatically Annotated Learner Corpus for Russian
Anisia Katinskaia | Maria Lebedeva | Jue Hou | Roman Yangarber
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present ReLCo— the Revita Learner Corpus—a new semi-automatically annotated learner corpus for Russian. The corpus was collected while several thousand L2 learners were performing exercises using the Revita language-learning system. All errors were detected automatically by the system and annotated by type. Part of the corpus was annotated manually—this part was created for further experiments on automatic assessment of grammatical correctness. The Learner Corpus provides valuable data for studying patterns of grammatical errors, experimenting with grammatical error detection and grammatical error correction, and developing new exercises for language learners. Automating the collection and annotation makes the process of building the learner corpus much cheaper and faster, in contrast to the traditional approach of building learner corpora. We make the data publicly available.

2021

Assessing Grammatical Correctness in Language Learning
Anisia Katinskaia | Roman Yangarber
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

We present experiments on assessing the grammatical correctness of learners’ answers in a language-learning System (references to the System, and the links to the released data and code are withheld for anonymity). In particular, we explore the problem of detecting alternative-correct answers: when more than one inflected form of a lemma fits syntactically and semantically in a given context. We approach the problem with the methods for grammatical error detection (GED), since we hypothesize that models for detecting grammatical mistakes can assess the correctness of potential alternative answers in a learning setting. Due to the paucity of training data, we explore the ability of pre-trained BERT to detect grammatical errors and then fine-tune it using synthetic training data. In this work, we focus on errors in inflection. Our experiments show a. that pre-trained BERT performs worse at detecting grammatical irregularities for Russian than for English; b. that fine-tuned BERT yields promising results on assessing the correctness of grammatical exercises; and c. establish a new benchmark for Russian. To further investigate its performance, we compare fine-tuned BERT with one of the state-of-the-art models for GED (Bell et al., 2019) on our dataset and RULEC-GEC (Rozovskaya and Roth, 2019). We release the manually annotated learner dataset, used for testing, for general use.

2020

Using Crowdsourced Exercises for Vocabulary Training to Expand ConceptNet
Christos Rodosthenous | Verena Lyding | Federico Sangati | Alexander König | Umair ul Hassan | Lionel Nicolas | Jolita Horbacauskiene | Anisia Katinskaia | Lavinia Aparaschivei
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this work, we report on a crowdsourcing experiment conducted using the V-TREL vocabulary trainer which is accessed via a Telegram chatbot interface to gather knowledge on word relations suitable for expanding ConceptNet. V-TREL is built on top of a generic architecture implementing the implicit crowdsourding paradigm in order to offer vocabulary training exercises generated from the commonsense knowledge-base ConceptNet and – in the background – to collect and evaluate the learners’ answers to extend ConceptNet with new words. In the experiment about 90 university students learning English at C1 level, based on Common European Framework of Reference for Languages (CEFR), trained their vocabulary with V-TREL over a period of 16 calendar days. The experiment allowed to gather more than 12,000 answers from learners on different question types. In this paper we present in detail the experimental setup and the outcome of the experiment, which indicates the potential of our approach for both crowdsourcing data as well as fostering vocabulary skills.

We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on on-going proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend an LR called ConceptNet based on the input crowdsourced from language learners. We then present an international network called the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect) that provides the context to accelerate the implementation of this generic approach. Finally, we exemplify how it can be used in several language learning scenarios to produce a multitude of NLP resources and how it can therefore alleviate the long-standing NLP issue of the lack of LRs.

Toward a Paradigm Shift in Collection of Learner Corpora
Anisia Katinskaia | Sardana Ivanova | Roman Yangarber
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present the first version of the longitudinal Revita Learner Corpus (ReLCo), for Russian. In contrast to traditional learner corpora, ReLCo is collected and annotated fully automatically, while students perform exercises using the Revita language-learning platform. The corpus currently contains 8 422 sentences exhibiting several types of errors—grammatical, lexical, orthographic, etc.—which were committed by learners during practice and were automatically annotated by Revita. The corpus provides valuable information about patterns of learner errors and can be used as a language resource for a number of research tasks, while its creation is much cheaper and faster than for traditional learner corpora. A crucial advantage of ReLCo that it grows continually while learners practice with Revita, which opens the possibility of creating an unlimited learner resource with longitudinal data collected over time. We make the pilot version of the Russian ReLCo publicly available.

2019

Multiple Admissibility: Judging Grammaticality using Unlabeled Data in Language Learning
Anisia Katinskaia | Sardana Ivanova
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

We present our work on the problem of Multiple Admissibility (MA) in language learning. Multiple Admissibility occurs in many languages when more than one grammatical form of a word fits syntactically and semantically in a given context. In second language (L2) education - in particular, in intelligent tutoring systems/computer-aided language learning (ITS/CALL) systems, which generate exercises automatically - this implies that multiple alternative answers are possible. We treat the problem as a grammaticality judgement task. We train a neural network with an objective to label sentences as grammatical or ungrammatical, using a “simulated learner corpus”: a dataset with correct text, and with artificial errors generated automatically. While MA occurs commonly in many languages, this paper focuses on learning Russian. We present a detailed classification of the types of constructions in Russian, in which MA is possible, and evaluate the model using a test set built from answers provided by the users of a running language learning system.

v-trel: Vocabulary Trainer for Tracing Word Relations - An Implicit Crowdsourcing Approach
Verena Lyding | Christos Rodosthenous | Federico Sangati | Umair ul Hassan | Lionel Nicolas | Alexander König | Jolita Horbacauskiene | Anisia Katinskaia
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper, we present our work on developing a vocabulary trainer that uses exercises generated from language resources such as ConceptNet and crowdsources the responses of the learners to enrich the language resource. We performed an empirical evaluation of our approach with 60 non-native speakers over two days, which shows that new entries to expand Concept-Net can efficiently be gathered through vocabulary exercises on word relations. We also report on the feedback gathered from the users and an expert from language teaching, and discuss the potential of the vocabulary trainer application from the user and language learner perspective. The feedback suggests that v-trel has educational potential, while in its current state some shortcomings could be identified.

Tools for supporting language learning for Sakha
Sardana Ivanova | Anisia Katinskaia | Roman Yangarber
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper presents an overview of the available linguistic resources for the Sakha language, and presents new tools for supporting language learning for Sakha. The essential resources include a morphological analyzer, digital dictionaries, and corpora of Sakha texts. Based on these resources, we implement a language-learning environment for Sakha in the Revita CALL platform. We extended an earlier, preliminary version of the morphological analyzer/transducer, built on the Apertium finite-state platform. The analyzer currently has an adequate level of coverage, between 86% and 89% on two Sakha corpora. Revita is a freely available online language learning platform for learners beyond the beginner level. We describe the tools for Sakha currently integrated into the Revita platform. To the best of our knowledge, at present, this is the first large-scale project undertaken to support intermediate-advanced learners of a minority Siberian language.

2018

Revita: a Language-learning Platform at the Intersection of ITS and CALL
Anisia Katinskaia | Javad Nouri | Roman Yangarber
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

Grouping business news stories based on salience of named entities
Llorenç Escoter | Lidia Pivovarova | Mian Du | Anisia Katinskaia | Roman Yangarber
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user–reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience–a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus of business news stories.

Revita: a system for language learning and supporting endangered languages
Anisia Katinskaia | Javad Nouri | Roman Yangarber
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition

Venues