Sofia Neri


2024

pdf bib
Recurrent Networks Are (Linguistically) Better? An (Ongoing) Experiment on Small-LM Training on Child-Directed Speech in Italian
Achille Fusco | Matilde Barbini | Maria Letizia Piccini Bianchessi | Veronica Bressan | Sofia Neri | Sarah Rossi | Tommaso Sgrizzi | Cristiano Chesi
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

We discuss the strategies and results of a small-sized training program based on Italian child-directed speech (less than 3M tokens) for various network architectures. The rationale behind these experiments [1] lies in the attempt to understand the effect of this naturalistic training diet on different models architecture. Preliminary findings lead us to conclude that (a) different tokenization strategies produce only numerical, but not statistically significant, improvements overall, although segmentation aligns more or less with linguistic intuitions; and (b) modified LSTM networks with a single layer and a structurally more controlled cell state perform worse in training (compared to standard one- and two-layered LSTM models) but better on linguistically critical contrasts. This suggests that standard loss/accuracy metrics in autoregressive training procedures are linguistically irrelevant and, more generally, misleading, since the best-trained models qualify as poorer “linguistic theories” ([2], pace [3]).

pdf bib
Different Ways to Forget: Linguistic Gates in Recurrent Neural Networks
Cristiano Chesi | Veronica Bressan | Matilde Barbini | Achille Fusco | Maria Letizia Piccini Bianchessi | Sofia Neri | Sarah Rossi | Tommaso Sgrizzi
The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning

This work explores alternative gating systems in simple Recurrent Neural Networks (RNNs) that induce linguistically motivated biases during training, ultimately affecting models’ performance on the BLiMP task. We focus exclusively on the BabyLM 10M training corpus (Strict-Small Track). Our experiments reveal that: (i) standard RNN variants—LSTMs and GRUs—are insufficient for properly learning the relevant set of linguistic constraints; (ii) the quality or size of the training corpus has little impact on these networks, as demonstrated by the comparable performance of LSTMs trained exclusively on the child-directed speech portion of the corpus; (iii) increasing the size of the embedding and hidden layers does not significantly improve performance. In contrast, specifically gated RNNs (eMG-RNNs), inspired by certain Minimalist Grammar intuitions, exhibit advantages in both training loss and BLiMP accuracy.