Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks Andrea Sottana author Bin Liang author Kai Zou author Zheng Yuan author 2023-12 text Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Houda Bouamor editor Juan Pino editor Kalika Bali editor Association for Computational Linguistics Singapore conference publication sottana-etal-2023-evaluation 10.18653/v1/2023.emnlp-main.543 https://aclanthology.org/2023.emnlp-main.543/ 2023-12 8776 8788