Alina Hancharova
2024
BabyLM Challenge: Experimenting with Self-Distillation and Reverse-Distillation for Language Model Pre-Training on Constrained Datasets
Aakarsh Nair
|
Alina Hancharova
|
Mayank Kumar
|
Ali Gharaee
The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning
Language models (LMs) exhibit significant data inefficiency compared to human learners. A child is able to master language while consuming less than 100 million words of input, while language models require orders of magnitude more tokens during training. Our submission to the BabyLM Challenge utilizes a combination of self-distillation and reverse-distillation to train a sequence of ensemble models with improved training characteristics on a fixed-size 10 million-word dataset. Self-distillation is used to generate an ensemble of models of a certain fixed size, while reverse distillation is used to train a more expressive larger model from a previously trained generation of relatively smaller models, while largely preserving learned accuracy.We find that ensembles consisting of two smaller models and one identical born-again model serve as ideal ensembles for each trained generation of model size. We demonstrate that, although our method is not novel, it provides consistent and modest performance improvements on the BLiMP and GLUE benchmarks.
2023
Team ISCL_WINTER at SemEval-2023 Task 12:AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset
Alina Hancharova
|
John Wang
|
Mayank Kumar
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper presents a study on the effectiveness of various approaches for addressing the challenge of multilingual sentiment analysis in low-resource African languages. . The approaches evaluated in the study include Support Vector Machines (SVM), translation, and an ensemble of pre-trained multilingual sentimental models methods. The paper provides a detailed analysis of the performance of each approach based on experimental results. In our findings, we suggest that the ensemble method is the most effective with an F1-Score of 0.68 on the final testing. This system ranked 19 out of 33 participants in the competition.