Balanced Data Sampling for Language Model Training with Clustering Yunfan Shao author Linyang Li author Zhaoye Fei author Hang Yan author Dahua Lin author Xipeng Qiu author 2024-08 text Findings of the Association for Computational Linguistics: ACL 2024 Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication shao-etal-2024-balanced 10.18653/v1/2024.findings-acl.833 https://aclanthology.org/2024.findings-acl.833/ 2024-08 14012 14023