pdf bibBaby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penaltyInar Timiryasov | Jean-Loup TastetProceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning