SubmissionNumber#=%=#24 FinalPaperTitle#=%=#Brandeis at VarDial 2024 DSL-ML Shared Task: Multilingual models, simple baselines and data augmentation ShortPaperTitle#=%=# NumberOfPages#=%=#11 CopyrightSigned#=%=# JobTitle#==# Organization#==# Abstract#==#This paper describes the Brandeis University submission to VarDial 2024 DSL-ML Shared Task on multilabel classification for discriminating between similar languages. Our submission consists of three entries per language to the closed track, where no additional data was permitted. Our approach involves a set of simple non-neural baselines using logistic regression, random forests and support vector machines. We follow this by experimenting with finetuning multilingual BERT, either on a single language or all the languages concatenated together. In addition to benchmarking the model architectures against one another on the development set, we perform extensive hyperparameter tuning, which is afforded by the small size of the training data. Our experiments on the development set suggest that finetuned mBERT systems significantly benefit most languages compared to the baseline. However, on the test set, our results indicate that simple models based on scikit-learn can perform surprisingly well and even outperform pretrained language models, as we see with BCMS. Our submissions achieve the best performance on all languages as reported by the organizers. Except for Spanish and French, our non-neural baseline also ranks in the top 3 for all other languages. Author{1}{Firstname}#=%=#Jonne Author{1}{Lastname}#=%=#Saleva Author{1}{Username}#=%=#jonnesaleva Author{1}{Email}#=%=#jonnesaleva@brandeis.edu Author{1}{Affiliation}#=%=#Brandeis University Author{2}{Firstname}#=%=#Chester Author{2}{Lastname}#=%=#Palen-Michel Author{2}{Username}#=%=#cpalemichel Author{2}{Email}#=%=#cpalenmichel@brandeis.edu Author{2}{Affiliation}#=%=#Brandeis University ========== èéáğö