Giovanni Cassani


2024

pdf bib
BigNLI: Native Language Identification with Big Bird Embeddings
Sergey Kramp | Giovanni Cassani | Chris Emmery
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Native Language Identification (NLI) intends to classify an author’s native language based on their writing in another language. Historically, the task has heavily relied on time-consuming linguistic feature engineering, and NLI transformer models have thus far failed to offer effective, practical alternatives. The current work shows input size is a limiting factor, and that classifiers trained using Big Bird embeddings outperform linguistic feature engineering models (for which we reproduce previous work) by a large margin on the Reddit-L2 dataset. Additionally, we provide further insight into input length dependencies, show consistent out-of-sample (Europe subreddit) and out-of-domain (TOEFL-11) performance, and qualitatively analyze the embedding space. Given the effectiveness and computational efficiency of this method, we believe it offers a promising avenue for future NLI work.

2015

pdf bib
Towards a Model of Prediction-based Syntactic Category Acquisition: First Steps with Word Embeddings
Robert Grimm | Giovanni Cassani | Walter Daelemans | Steven Gillis
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning

pdf bib
Which distributional cues help the most? Unsupervised contexts selection for lexical category acquisition
Giovanni Cassani | Robert Grimm | Walter Daelemans | Steven Gillis
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning