Automated test generation to evaluate tool-augmented LLMs as conversational AI agents

Automated test generation to evaluate tool-augmented LLMs as conversational AI agents Samuel Arcadinho author David Oliveira Aparicio author Mariana S C Almeida author 2024-11 text Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP Dieuwke Hupkes editor Verna Dankers editor Khuyagbaatar Batsuren editor Amirhossein Kazemnejad editor Christos Christodoulopoulos editor Mario Giulianelli editor Ryan Cotterell editor Association for Computational Linguistics Miami, Florida, USA conference publication arcadinho-etal-2024-automated 10.18653/v1/2024.genbench-1.4 https://aclanthology.org/2024.genbench-1.4/ 2024-11 54 68