NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes

NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes Lizhou Fan author Wenyue Hua author Lingyao Li author Haoyang Ling author Yongfeng Zhang author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication fan-etal-2024-nphardeval 10.18653/v1/2024.acl-long.225 https://aclanthology.org/2024.acl-long.225/ 2024-08 4092 4114