DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling

DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling Shanghaoran Quan author 2024-08 text Findings of the Association for Computational Linguistics: ACL 2024 Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication quan-2024-dmoerm 10.18653/v1/2024.findings-acl.418 https://aclanthology.org/2024.findings-acl.418/ 2024-08 7006 7028