OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models Hainiu Xu author Runcong Zhao author Lixing Zhu author Jinhua Du author Yulan He author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication xu-etal-2024-opentom 10.18653/v1/2024.acl-long.466 https://aclanthology.org/2024.acl-long.466/ 2024-08 8593 8623