Ekaterina Kochmar


2024

pdf bib
Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing
Kv Aditya Srivatsa | Kaushal Maurya | Ekaterina Kochmar
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP

With the rapid development of LLMs, it is natural to ask how to harness their capabilities efficiently. In this paper, we explore whether it is feasible to direct each input query to a single most suitable LLM. To this end, we propose LLM routing for challenging reasoning tasks. Our extensive experiments suggest that such routing shows promise but is not feasible in all scenarios, so more robust approaches should be investigated to fill this gap.

pdf bib
What Makes Math Word Problems Challenging for LLMs?
Kv Aditya Srivatsa | Ekaterina Kochmar
Findings of the Association for Computational Linguistics: NAACL 2024

This paper investigates the question of what makes math word problems (MWPs) in English challenging for large language models (LLMs). We conduct an in-depth analysis of the key linguistic and mathematical characteristics of MWPs. In addition, we train feature-based classifiers to better understand the impact of each feature on the overall difficulty of MWPs for prominent LLMs and investigate whether this helps predict how well LLMs fare against specific categories of MWPs.

pdf bib
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
Ekaterina Kochmar | Marie Bexte | Jill Burstein | Andrea Horbach | Ronja Laarmann-Quante | Anaïs Tack | Victoria Yaneva | Zheng Yuan
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

pdf bib
REFeREE: A REference-FREE Model-Based Metric for Text Simplification
Yichen Huang | Ekaterina Kochmar
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Text simplification lacks a universal standard of quality, and annotated reference simplifications are scarce and costly. We propose to alleviate such limitations by introducing REFeREE, a reference-free model-based metric with a 3-stage curriculum. REFeREE leverages an arbitrarily scalable pretraining stage and can be applied to any quality standard as long as a small number of human annotations are available. Our experiments show that our metric outperforms existing reference-based metrics in predicting overall ratings and reaches competitive and consistent performance in predicting specific ratings while requiring no reference simplifications at inference time.

pdf bib
PetKaz at SemEval-2024 Task 3: Advancing Emotion Classification with an LLM for Emotion-Cause Pair Extraction in Conversations
Roman Kazakov | Kseniia Petukhova | Ekaterina Kochmar
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we present our submission to the SemEval-2023 Task 3 “The Competition of Multimodal Emotion Cause Analysis in Conversations”, focusing on extracting emotion-cause pairs from dialogs. Specifically, our approach relies on combining fine-tuned GPT-3.5 for emotion classification and using a BiLSTM-based neural network to detect causes. We score 2nd in the ranking for Subtask 1, demonstrating the effectiveness of our approach through one of the highest weighted-average proportional F1 scores recorded at 0.264.

pdf bib
PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?
Kseniia Petukhova | Roman Kazakov | Ekaterina Kochmar
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we present our submission to the SemEval-2024 Task 8 “Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection”, focusing on the detection of machine-generated texts (MGTs) in English. Specifically, our approach relies on combining embeddings from the RoBERTa-base with diversity features and uses a resampled training set. We score 16th from 139 in the ranking for Subtask A, and our results show that our approach is generalizable across unseen models and domains, achieving an accuracy of 0.91.

2023

pdf bib
Automatic Readability Assessment for Closely Related Languages
Joseph Marvin Imperial | Ekaterina Kochmar
Findings of the Association for Computational Linguistics: ACL 2023

In recent years, the main focus of research on automatic readability assessment (ARA) has shifted towards using expensive deep learning-based methods with the primary goal of increasing models’ accuracy. This, however, is rarely applicable for low-resource languages where traditional handcrafted features are still widely used due to the lack of existing NLP tools to extract deeper linguistic representations. In this work, we take a step back from the technical component and focus on how linguistic aspects such as mutual intelligibility or degree of language relatedness can improve ARA in a low-resource setting. We collect short stories written in three languages in the Philippines—Tagalog, Bikol, and Cebuano—to train readability assessment models and explore the interaction of data and features in various cross-lingual setups. Our results show that the inclusion of CrossNGO, a novel specialized feature exploiting n-gram overlap applied to languages with high mutual intelligibility, significantly improves the performance of ARA models compared to the use of off-the-shelf large multilingual language models alone. Consequently, when both linguistic representations are combined, we achieve state-of-the-art results for Tagalog and Cebuano, and baseline scores for ARA in Bikol.

pdf bib
BasahaCorpus: An Expanded Linguistic Resource for Readability Assessment in Central Philippine Languages
Joseph Marvin Imperial | Ekaterina Kochmar
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Current research on automatic readability assessment (ARA) has focused on improving the performance of models in high-resource languages such as English. In this work, we introduce and release BasahaCorpus as part of an initiative aimed at expanding available corpora and baseline models for readability assessment in lower resource languages in the Philippines. We compiled a corpus of short fictional narratives written in Hiligaynon, Minasbate, Karay-a, and Rinconada—languages belonging to the Central Philippine family tree subgroup—to train ARA models using surface-level, syllable-pattern, and n-gram overlap features. We also propose a new hierarchical cross-lingual modeling approach that takes advantage of a language’s placement in the family tree to increase the amount of available training data. Our study yields encouraging results that support previous work showcasing the efficacy of cross-lingual models in low-resource settings, as well as similarities in highly informative linguistic features for mutually intelligible languages.

pdf bib
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Ekaterina Kochmar | Jill Burstein | Andrea Horbach | Ronja Laarmann-Quante | Nitin Madnani | Anaïs Tack | Victoria Yaneva | Zheng Yuan | Torsten Zesch
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

pdf bib
The BEA 2023 Shared Task on Generating AI Teacher Responses in Educational Dialogues
Anaïs Tack | Ekaterina Kochmar | Zheng Yuan | Serge Bibauw | Chris Piech
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

This paper describes the results of the first shared task on generation of teacher responses in educational dialogues. The goal of the task was to benchmark the ability of generative language models to act as AI teachers, replying to a student in a teacher-student dialogue. Eight teams participated in the competition hosted on CodaLab and experimented with a wide variety of state-of-the-art models, including Alpaca, Bloom, DialoGPT, DistilGPT-2, Flan-T5, GPT- 2, GPT-3, GPT-4, LLaMA, OPT-2.7B, and T5- base. Their submissions were automatically scored using BERTScore and DialogRPT metrics, and the top three among them were further manually evaluated in terms of pedagogical ability based on Tack and Piech (2022). The NAISTeacher system, which ranked first in both automated and human evaluation, generated responses with GPT-3.5 Turbo using an ensemble of prompts and DialogRPT-based ranking of responses for given dialogue contexts. Despite promising achievements of the participating teams, the results also highlight the need for evaluation metrics better suited to educational contexts.

2022

pdf bib
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)
Ekaterina Kochmar | Jill Burstein | Andrea Horbach | Ronja Laarmann-Quante | Nitin Madnani | Anaïs Tack | Victoria Yaneva | Zheng Yuan | Torsten Zesch
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)

2021

pdf bib
Word Complexity is in the Eye of the Beholder
Sian Gooding | Ekaterina Kochmar | Seid Muhie Yimam | Chris Biemann
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Lexical complexity is a highly subjective notion, yet this factor is often neglected in lexical simplification and readability systems which use a ”one-size-fits-all” approach. In this paper, we investigate which aspects contribute to the notion of lexical complexity in various groups of readers, focusing on native and non-native speakers of English, and how the notion of complexity changes depending on the proficiency level of a non-native reader. To facilitate reproducibility of our approach and foster further research into these aspects, we release a dataset of complex words annotated by readers with different backgrounds.

pdf bib
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
Jill Burstein | Andrea Horbach | Ekaterina Kochmar | Ronja Laarmann-Quante | Claudia Leacock | Nitin Madnani | Ildikó Pilán | Helen Yannakoudakis | Torsten Zesch
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

2020

pdf bib
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications
Jill Burstein | Ekaterina Kochmar | Claudia Leacock | Nitin Madnani | Ildikó Pilán | Helen Yannakoudakis | Torsten Zesch
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Detecting Multiword Expression Type Helps Lexical Complexity Assessment
Ekaterina Kochmar | Sian Gooding | Matthew Shardlow
Proceedings of the Twelfth Language Resources and Evaluation Conference

Multiword expressions (MWEs) represent lexemes that should be treated as single lexical units due to their idiosyncratic nature. Multiple NLP applications have been shown to benefit from MWE identification, however the research on lexical complexity of MWEs is still an under-explored area. In this work, we re-annotate the Complex Word Identification Shared Task 2018 dataset of Yimam et al. (2017), which provides complexity scores for a range of lexemes, with the types of MWEs. We release the MWE-annotated dataset with this paper, and we believe this dataset represents a valuable resource for the text simplification community. In addition, we investigate which types of expressions are most problematic for native and non-native readers. Finally, we show that a lexical complexity assessment system benefits from the information about MWE types.

pdf bib
SeCoDa: Sense Complexity Dataset
David Strohmaier | Sian Gooding | Shiva Taslimipoor | Ekaterina Kochmar
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Sense Complexity Dataset (SeCoDa) provides a corpus that is annotated jointly for complexity and word senses. It thus provides a valuable resource for both word sense disambiguation and the task of complex word identification. The intention is that this dataset will be used to identify complexity at the level of word senses rather than word tokens. For word sense annotation SeCoDa uses a hierarchical scheme that is based on information available in the Cambridge Advanced Learner’s Dictionary. This way we can offer more coarse-grained senses than directly available in WordNet.

pdf bib
MTLB-STRUCT @Parseme 2020: Capturing Unseen Multiword Expressions Using Multi-task Learning and Pre-trained Masked Language Models
Shiva Taslimipoor | Sara Bahaadini | Ekaterina Kochmar
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

This paper describes a semi-supervised system that jointly learns verbal multiword expressions (VMWEs) and dependency parse trees as an auxiliary task. The model benefits from pre-trained multilingual BERT. BERT hidden layers are shared among the two tasks and we introduce an additional linear layer to retrieve VMWE tags. The dependency parse tree prediction is modelled by a linear layer and a bilinear one plus a tree CRF architecture on top of the shared BERT. The system has participated in the open track of the PARSEME shared task 2020 and ranked first in terms of F1-score in identifying unseen VMWEs as well as VMWEs in general, averaged across all 14 languages.

pdf bib
Incorporating Multiword Expressions in Phrase Complexity Estimation
Sian Gooding | Shiva Taslimipoor | Ekaterina Kochmar
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

Multiword expressions (MWEs) were shown to be useful in a number of NLP tasks. However, research on the use of MWEs in lexical complexity assessment and simplification is still an under-explored area. In this paper, we propose a text complexity assessment system for English, which incorporates MWE identification. We show that detecting MWEs using state-of-the-art systems improves predicting complexity on an established lexical complexity dataset.

2019

pdf bib
Complex Word Identification as a Sequence Labelling Task
Sian Gooding | Ekaterina Kochmar
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Complex Word Identification (CWI) is concerned with detection of words in need of simplification and is a crucial first step in a simplification pipeline. It has been shown that reliable CWI systems considerably improve text simplification. However, most CWI systems to date address the task on a word-by-word basis, not taking the context into account. In this paper, we present a novel approach to CWI based on sequence modelling. Our system is capable of performing CWI in context, does not require extensive feature engineering and outperforms state-of-the-art systems on this task.

pdf bib
Automatic learner summary assessment for reading comprehension
Menglin Xia | Ekaterina Kochmar | Ted Briscoe
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Automating the assessment of learner summary provides a useful tool for assessing learner reading comprehension. We present a summarization task for evaluating non-native reading comprehension and propose three novel approaches to automatically assess the learner summaries. We evaluate our models on two datasets we created and show that our models outperform traditional approaches that rely on exact word match on this task. Our best model produces quality assessments close to professional examiners.

pdf bib
Recursive Context-Aware Lexical Simplification
Sian Gooding | Ekaterina Kochmar
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper presents a novel architecture for recursive context-aware lexical simplification, REC-LS, that is capable of (1) making use of the wider context when detecting the words in need of simplification and suggesting alternatives, and (2) taking previous simplification steps into account. We show that our system outputs lexical simplifications that are grammatically correct and semantically appropriate, and outperforms the current state-of-the-art systems in lexical simplification.

pdf bib
Comparative judgments are more consistent than binary classification for labelling word complexity
Sian Gooding | Ekaterina Kochmar | Advait Sarkar | Alan Blackwell
Proceedings of the 13th Linguistic Annotation Workshop

Lexical simplification systems replace complex words with simple ones based on a model of which words are complex in context. We explore how users can help train complex word identification models through labelling more efficiently and reliably. We show that using an interface where annotators make comparative rather than binary judgments leads to more reliable and consistent labels, and explore whether comparative judgments may provide a faster way for collecting labels.

pdf bib
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Helen Yannakoudakis | Ekaterina Kochmar | Claudia Leacock | Nitin Madnani | Ildikó Pilán | Torsten Zesch
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

2018

pdf bib
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Ekaterina Kochmar | Claudia Leacock | Helen Yannakoudakis
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
CAMB at CWI Shared Task 2018: Complex Word Identification with Ensemble-Based Voting
Sian Gooding | Ekaterina Kochmar
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper presents the winning systems we submitted to the Complex Word Identification Shared Task 2018. We describe our best performing systems’ implementations and discuss our key findings from this research. Our best-performing systems achieve an F1 score of 0.8792 on the NEWS, 0.8430 on the WIKINEWS and 0.8115 on the WIKIPEDIA test sets in the monolingual English binary classification track, and a mean absolute error of 0.0558 on the NEWS, 0.0674 on the WIKINEWS and 0.0739 on the WIKIPEDIA test sets in the probabilistic track.

2017

pdf bib
Modelling semantic acquisition in second language learning
Ekaterina Kochmar | Ekaterina Shutova
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Using methods of statistical analysis, we investigate how semantic knowledge is acquired in English as a second language and evaluate the pace of development across a number of predicate types and content word combinations, as well as across the levels of language proficiency and native languages. Our exploratory study helps identify the most problematic areas for language learners with different backgrounds and at different stages of learning.

2016

pdf bib
Text Readability Assessment for Second Language Learners
Menglin Xia | Ekaterina Kochmar | Ted Briscoe
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
‘Calling on the classical phone’: a distributional model of adjective-noun errors in learners’ English
Aurélie Herbelot | Ekaterina Kochmar
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper we discuss three key points related to error detection (ED) in learners’ English. We focus on content word ED as one of the most challenging tasks in this area, illustrating our claims on adjective–noun (AN) combinations. In particular, we (1) investigate the role of context in accurately capturing semantic anomalies and implement a system based on distributional topic coherence, which achieves state-of-the-art accuracy on a standard test set; (2) thoroughly investigate our system’s performance across individual adjective classes, concluding that a class-dependent approach is beneficial to the task; (3) discuss the data size bottleneck in this area, and highlight the challenges of automatic error generation for content words.

pdf bib
Cross-Lingual Lexico-Semantic Transfer in Language Learning
Ekaterina Kochmar | Ekaterina Shutova
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Using Learner Data to Improve Error Correction in Adjective–Noun Combinations
Ekaterina Kochmar | Ted Briscoe
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

2014

pdf bib
Grammatical error correction using hybrid systems and type filtering
Mariano Felice | Zheng Yuan | Øistein E. Andersen | Helen Yannakoudakis | Ekaterina Kochmar
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Proceedings of the ACL 2014 Student Research Workshop
Ekaterina Kochmar | Annie Louis | Svitlana Volkova | Jordan Boyd-Graber | Bill Byrne
Proceedings of the ACL 2014 Student Research Workshop

pdf bib
Detecting Learner Errors in the Choice of Content Words Using Compositional Distributional Semantics
Ekaterina Kochmar | Ted Briscoe
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Capturing Anomalies in the Choice of Content Words in Compositional Distributional Semantic Space
Ekaterina Kochmar | Ted Briscoe
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf bib
HOO 2012 Error Recognition and Correction Shared Task: Cambridge University Submission Report
Ekaterina Kochmar | Øistein Andersen | Ted Briscoe
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP