pdf bibEnhancing Multilingual LLM Pretraining with Model-Based Data SelectionBettina Messmer | Vinko Sabolčec | Martin JaggiProceedings of the 10th edition of the Swiss Text Analytics Conference