Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates

Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates Yusuke Sakai author Adam Nohejl author Jiangnan Hang author Hidetaka Kamigaito author Taro Watanabe author 2024-11 text Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP Yonatan Belinkov editor Najoung Kim editor Jaap Jumelet editor Hosein Mohebbi editor Aaron Mueller editor Hanjie Chen editor Association for Computational Linguistics Miami, Florida, US conference publication sakai-etal-2024-toward 10.18653/v1/2024.blackboxnlp-1.31 https://aclanthology.org/2024.blackboxnlp-1.31/ 2024-11 499 529