MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues Ge Bai author Jie Liu author Xingyuan Bu author Yancheng He author Jiaheng Liu author Zhanhui Zhou author Zhuoran Lin author Wenbo Su author Tiezheng Ge author Bo Zheng author Wanli Ouyang author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication bai-etal-2024-mt 10.18653/v1/2024.acl-long.401 https://aclanthology.org/2024.acl-long.401/ 2024-08 7421 7454