Case–Number Dissociation in Finnish Noun Embeddings:fastText vs. BERT Layer Effects

Alexandre Nikolaev; Yu-Ying Chuang; Harald Baayen

Case–Number Dissociation in Finnish Noun Embeddings:fastText vs. BERT Layer Effects

Alexandre Nikolaev, Yu-Ying Chuang, R. Harald Baayen

Abstract

Motivated by how inflectional morphology is encoded in modern embeddings, we revisit the 55,271 inflected forms from the 2,000 most frequent Finnish nouns analyzed by Nikolaev et al. (2022) using fastText and ask a single question: where does inflectional morphology emerge in BERT? For each form, we extract minimal-context FinBERT vectors from every layer (1–12) by running each word in isolation and averaging its WordPiece vectors into a single representation. Using the same generating model as in Nikolaev et al. (2022), we impute latent vectors for the stem, N UMBER, C ASE, P OSSESSIVE, and C LITIC, plus a higher-order interaction, and evaluate by rank-1 nearest correlation. Within BERT, accuracy follows an emergence curve from 67.21% (layer 1) to 86.16% (layer 12). The error mix shifts with depth: middle layers show a lower share of C ASE errors but a higher share of N UMBER errors, whereas the top layer reverses this tendency; clitic-only errors are rare throughout. For context, the fastText ceiling is slightly higher (≈89%), but our focus is the layer-resolved profile inside BERT. The result is a compact, reproducible map of Finnish noun inflection across the BERT stack, showing how different inflectional cues become recoverable at different depths (BERT layers) under an identical modeling and evaluation pipeline.

Anthology ID:: 2025.iwclul-1.16
Volume:: Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages
Month:: December
Year:: 2025
Address:: Joensuu, Finland
Editors:: Mika Hämäläinen, Michael Rießler, Eiaki V. Morooka, Lev Kharlashkin
Venues:: IWCLUL | WS
SIG:: SIGUR
Publisher:: Association for Computational Linguistics
Note:
Pages:: 127–130
Language:
URL:: https://aclanthology.org/2025.iwclul-1.16/
DOI:
Bibkey:
Cite (ACL):: Alexandre Nikolaev, Yu-Ying Chuang, and R. Harald Baayen. 2025. Case–Number Dissociation in Finnish Noun Embeddings:fastText vs. BERT Layer Effects. In Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages, pages 127–130, Joensuu, Finland. Association for Computational Linguistics.
Cite (Informal):: Case–Number Dissociation in Finnish Noun Embeddings:fastText vs. BERT Layer Effects (Nikolaev et al., IWCLUL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.iwclul-1.16.pdf

PDF Cite Search Fix data