SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading Tu Anh Dinh author Carlos Mullov author Leonard Bärmann author Zhaolin Li author Danni Liu author Simon Reiß author Jueun Lee author Nathan Lerzer author Jianfeng Gao author Fabian Peller-Konrad author Tobias Röddiger author Alexander Waibel author Tamim Asfour author Michael Beigl author Rainer Stiefelhagen author Carsten Dachsbacher author Klemens Böhm author Jan Niehues author 2024-11 text Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Yaser Al-Onaizan editor Mohit Bansal editor Yun-Nung Chen editor Association for Computational Linguistics Miami, Florida, USA conference publication dinh-etal-2024-sciex 10.18653/v1/2024.emnlp-main.647 https://aclanthology.org/2024.emnlp-main.647/ 2024-11 11592 11610