Alexandre Nikolaev
2025
Case–Number Dissociation in Finnish Noun Embeddings:fastText vs. BERT Layer Effects
Alexandre Nikolaev
|
Yu-Ying Chuang
|
R. Harald Baayen
Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages
Motivated by how inflectional morphology is encoded in modern embeddings, we revisit the 55,271 inflected forms from the 2,000 most frequent Finnish nouns analyzed by Nikolaev et al. (2022) using fastText and ask a single question: where does inflectional morphology emerge in BERT? For each form, we extract minimal-context FinBERT vectors from every layer (1–12) by running each word in isolation and averaging its WordPiece vectors into a single representation. Using the same generating model as in Nikolaev et al. (2022), we impute latent vectors for the stem, N UMBER, C ASE, P OSSESSIVE, and C LITIC, plus a higher-order interaction, and evaluate by rank-1 nearest correlation. Within BERT, accuracy follows an emergence curve from 67.21% (layer 1) to 86.16% (layer 12). The error mix shifts with depth: middle layers show a lower share of C ASE errors but a higher share of N UMBER errors, whereas the top layer reverses this tendency; clitic-only errors are rare throughout. For context, the fastText ceiling is slightly higher (≈89%), but our focus is the layer-resolved profile inside BERT. The result is a compact, reproducible map of Finnish noun inflection across the BERT stack, showing how different inflectional cues become recoverable at different depths (BERT layers) under an identical modeling and evaluation pipeline.
Generative AI for Technical Writing: Comparing Human and LLM Assessments of Generated Content
Karen de Souza
|
Alexandre Nikolaev
|
Maarit Koponen
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)
Large language models (LLMs) have recently gained significant attention for their capabilities in natural language processing (NLP), particularly generative artificial intelligence (AI). LLMs can also be useful tools for software documentation technical writers. We present an assessment of technical documentation content generated by three different LLMs using retrieval-augmented technology (RAG) with product documentation as a knowledge base. The LLM-generated responses were analyzed in three ways: 1) manual error analysis by a technical writer, 2) automatic assessment using deterministic metrics (BLEU, ROUGE, token overlap), and 3) evaluation of correctness by LLM as a judge. The results of these assessments were compared using a Network Analysis and linear regression models to investigate statistical relationships, model preferences, and the distribution of human and LLM scores. The analyses concluded that human quality evaluation is more related to the LLM correctness judgment than deterministic metrics, even when using different analysis frameworks.