Katsiaryna Aharodnik


2025

pdf bib
From Handcrafted Features to LLMs: A Comparative Study in Native Language Identification
Aliyah C. Vanterpool | Katsiaryna Aharodnik
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

This study compares a traditional machine learning feature-engineering approach to a large language models (LLMs) fine-tuning method for Native Language Identification (NLI). We explored the COREFL corpus, which consists of L2 English narratives produced by Spanish and German L1 speakers with lower-advanced English proficiency (C1) (Lozano et al., 2020). For the feature-engineering approach, we extracted language productivity, linguistic diversity, and n-gram features for Support Vector Machine (SVM) classification. We also looked at sentence embeddings with SVM and logistic regression. For the LLM approach, we evaluated BERT-like models and GPT-4. The feature-engineering approach, particularly n-grams, outperformed the LLMs. Sentence-BERT embeddings with SVM achieved the second-highest accuracy (93%), while GPT-4 reached an average accuracy of 90.4% across three runs when prompted with labels. These findings suggest that feature engineering remains a robust method for NLI, especially for smaller datasets with subtle linguistic differences between classes. This study contributes to the comparative analysis of traditional machine learning and transformer-based LLMs, highlighting current LLM limitations in handling domain-specific data and their need for larger training resources.

2018

pdf bib
Designing a Russian Idiom-Annotated Corpus
Katsiaryna Aharodnik | Anna Feldman | Jing Peng
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2013

pdf bib
Automatic Identification of Learners’ Language Background Based on Their Writing in Czech
Katsiaryna Aharodnik | Marco Chang | Anna Feldman | Jirka Hana
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2011

pdf bib
A low-budget tagger for Old Czech
Jirka Hana | Anna Feldman | Katsiaryna Aharodnik
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities