2024
pdf
bib
Cognitive Plausibility in Natural Language Processing
Yevgen Matusevych
Computational Linguistics, Volume 50, Issue 3 - September 2024
pdf
bib
abs
Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence
Shaozhen Shi
|
Yevgen Matusevych
|
Malvina Nissim
The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning
This study presents our submission to the Strict-Small Track of the 2nd BabyLM Challenge. We use a teacher-student distillation setup with the BabyLLaMa model (Timiryasov and Tastet, 2023) as a backbone. To make the student’s learning process more focused, we replace the objective function with a reverse Kullback-Leibler divergence, known to cause mode-seeking (rather than mode-averaging) behaviour in computational learners. We further experiment with having a single teacher (instead of an ensemble of two teachers) and implement additional optimization strategies to improve the distillation process. Our experiments show that under reverse KL divergence, a single-teacher model often outperforms or matches multiple-teacher models across most tasks. Additionally, incorporating advanced optimization techniques further enhances model performance, demonstrating the effectiveness and robustness of our proposed approach. These findings support our idea that “choosy babies need one coach”.
pdf
bib
abs
Visually Grounded Speech Models Have a Mutual Exclusivity Bias
Leanne Nortje
|
Dan Oneaţă
|
Yevgen Matusevych
|
Herman Kamper
Transactions of the Association for Computational Linguistics, Volume 12
When children learn new words, they employ constraints such as the mutual exclusivity (ME) bias: A novel word is mapped to a novel object rather than a familiar one. This bias has been studied computationally, but only in models that use discrete word representations as input, ignoring the high variability of spoken words. We investigate the ME bias in the context of visually grounded speech models that learn from natural images and continuous speech audio. Concretely, we train a model on familiar words and test its ME bias by asking it to select between a novel and a familiar object when queried with a novel word. To simulate prior acoustic and visual knowledge, we experiment with several initialization strategies using pretrained speech and vision networks. Our findings reveal the ME bias across the different initialization approaches, with a stronger bias in models with more prior (in particular, visual) knowledge. Additional tests confirm the robustness of our results, even when different loss functions are considered. Based on detailed analyses to piece out the model’s representation space, we attribute the ME bias to how familiar and novel classes are distinctly separated in the resulting space.
2021
pdf
bib
abs
A phonetic model of non-native spoken word processing
Yevgen Matusevych
|
Herman Kamper
|
Thomas Schatz
|
Naomi Feldman
|
Sharon Goldwater
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Non-native speakers show difficulties with spoken word processing. Many studies attribute these difficulties to imprecise phonological encoding of words in the lexical memory. We test an alternative hypothesis: that some of these difficulties can arise from the non-native speakers’ phonetic perception. We train a computational model of phonetic learning, which has no access to phonology, on either one or two languages. We first show that the model exhibits predictable behaviors on phone-level and word-level discrimination tasks. We then test the model on a spoken word processing task, showing that phonology may not be necessary to explain some of the word processing effects observed in non-native speakers. We run an additional analysis of the model’s lexical representation space, showing that the two training languages are not fully separated in that space, similarly to the languages of a bilingual human speaker.
2019
pdf
bib
abs
Are we there yet? Encoder-decoder neural networks as cognitive models of English past tense inflection
Maria Corkery
|
Yevgen Matusevych
|
Sharon Goldwater
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
The cognitive mechanisms needed to account for the English past tense have long been a subject of debate in linguistics and cognitive science. Neural network models were proposed early on, but were shown to have clear flaws. Recently, however, Kirov and Cotterell (2018) showed that modern encoder-decoder (ED) models overcome many of these flaws. They also presented evidence that ED models demonstrate humanlike performance in a nonce-word task. Here, we look more closely at the behaviour of their model in this task. We find that (1) the model exhibits instability across multiple simulations in terms of its correlation with human data, and (2) even when results are aggregated across simulations (treating each simulation as an individual human participant), the fit to the human data is not strong—worse than an older rule-based model. These findings hold up through several alternative training regimes and evaluation measures. Although other neural architectures might do better, we conclude that there is still insufficient evidence to claim that neural nets are a good cognitive model for this task.
2018
pdf
bib
Modeling bilingual word associations as connected monolingual networks
Yevgen Matusevych
|
Amir Ardalan Kalantari Dehaghi
|
Suzanne Stevenson
Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)
2013
pdf
bib
Computational simulations of second language construction learning
Yevgen Matusevych
|
Afra Alishahi
|
Ad Backus
Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL)