Walking a Tightrope – Evaluating Large Language Models in High-Risk Domains

Walking a Tightrope – Evaluating Large Language Models in High-Risk Domains Chia-Chien Hung author Wiem Ben Rim author Lindsay Frost author Lars Bruckner author Carolin Lawrence author 2023-12 text Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP Dieuwke Hupkes editor Verna Dankers editor Khuyagbaatar Batsuren editor Koustuv Sinha editor Amirhossein Kazemnejad editor Christos Christodoulopoulos editor Ryan Cotterell editor Elia Bruni editor Association for Computational Linguistics Singapore conference publication hung-etal-2023-walking 10.18653/v1/2023.genbench-1.8 https://aclanthology.org/2023.genbench-1.8/ 2023-12 99 111