Yevgen Matusevych

2025

pdf bib abs
Curse of bilinguality: Evaluating monolingual and bilingual language models on Chinese linguistic benchmarks
Yuwen Zhou | Yevgen Matusevych
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)

We investigate cross-lingual transfer effects in large language models (LLMs) trained on two high-resource languages, English and Chinese. Four monolingual Chinese and four bilingual English–Chinese models are evaluated on two Chinese linguistic benchmarks. The monolingual models consistently outperform the bilingual ones on 12 out of 55 tasks, while the reverse is true for only 4 tasks, highlighting the prevalence of negative (rather than positive) transfer from English to Chinese. Additionally, we carry out a feature attribution analysis in a monolingual and a bilingual model, showing that the differences in their performance may be explained by more predictable attribution patterns in the monolingual model. Our findings have implications for the ongoing effort of training bilingual LLMs.

2024

pdf bib
Cognitive Plausibility in Natural Language Processing
Yevgen Matusevych
Computational Linguistics, Volume 50, Issue 3 - September 2024

pdf bib abs
Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence
Shaozhen Shi | Yevgen Matusevych | Malvina Nissim
The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning

This study presents our submission to the Strict-Small Track of the 2nd BabyLM Challenge. We use a teacher-student distillation setup with the BabyLLaMa model (Timiryasov and Tastet, 2023) as a backbone. To make the student’s learning process more focused, we replace the objective function with a reverse Kullback-Leibler divergence, known to cause mode-seeking (rather than mode-averaging) behaviour in computational learners. We further experiment with having a single teacher (instead of an ensemble of two teachers) and implement additional optimization strategies to improve the distillation process. Our experiments show that under reverse KL divergence, a single-teacher model often outperforms or matches multiple-teacher models across most tasks. Additionally, incorporating advanced optimization techniques further enhances model performance, demonstrating the effectiveness and robustness of our proposed approach. These findings support our idea that “choosy babies need one coach”.

pdf bib abs
Visually Grounded Speech Models Have a Mutual Exclusivity Bias
Leanne Nortje | Dan Oneaţă | Yevgen Matusevych | Herman Kamper
Transactions of the Association for Computational Linguistics, Volume 12

When children learn new words, they employ constraints such as the mutual exclusivity (ME) bias: A novel word is mapped to a novel object rather than a familiar one. This bias has been studied computationally, but only in models that use discrete word representations as input, ignoring the high variability of spoken words. We investigate the ME bias in the context of visually grounded speech models that learn from natural images and continuous speech audio. Concretely, we train a model on familiar words and test its ME bias by asking it to select between a novel and a familiar object when queried with a novel word. To simulate prior acoustic and visual knowledge, we experiment with several initialization strategies using pretrained speech and vision networks. Our findings reveal the ME bias across the different initialization approaches, with a stronger bias in models with more prior (in particular, visual) knowledge. Additional tests confirm the robustness of our results, even when different loss functions are considered. Based on detailed analyses to piece out the model’s representation space, we attribute the ME bias to how familiar and novel classes are distinctly separated in the resulting space.

2021

pdf bib abs
A phonetic model of non-native spoken word processing
Yevgen Matusevych | Herman Kamper | Thomas Schatz | Naomi Feldman | Sharon Goldwater
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Non-native speakers show difficulties with spoken word processing. Many studies attribute these difficulties to imprecise phonological encoding of words in the lexical memory. We test an alternative hypothesis: that some of these difficulties can arise from the non-native speakers’ phonetic perception. We train a computational model of phonetic learning, which has no access to phonology, on either one or two languages. We first show that the model exhibits predictable behaviors on phone-level and word-level discrimination tasks. We then test the model on a spoken word processing task, showing that phonology may not be necessary to explain some of the word processing effects observed in non-native speakers. We run an additional analysis of the model’s lexical representation space, showing that the two training languages are not fully separated in that space, similarly to the languages of a bilingual human speaker.

2019

pdf bib abs
Are we there yet? Encoder-decoder neural networks as cognitive models of English past tense inflection
Maria Corkery | Yevgen Matusevych | Sharon Goldwater
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The cognitive mechanisms needed to account for the English past tense have long been a subject of debate in linguistics and cognitive science. Neural network models were proposed early on, but were shown to have clear flaws. Recently, however, Kirov and Cotterell (2018) showed that modern encoder-decoder (ED) models overcome many of these flaws. They also presented evidence that ED models demonstrate humanlike performance in a nonce-word task. Here, we look more closely at the behaviour of their model in this task. We find that (1) the model exhibits instability across multiple simulations in terms of its correlation with human data, and (2) even when results are aggregated across simulations (treating each simulation as an individual human participant), the fit to the human data is not strong—worse than an older rule-based model. These findings hold up through several alternative training regimes and evaluation measures. Although other neural architectures might do better, we conclude that there is still insufficient evidence to claim that neural nets are a good cognitive model for this task.