Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation Siyuan Wang author Zhuohan Long author Zhihao Fan author Xuanjing Huang author Zhongyu Wei author 2025-01 text Proceedings of the 31st International Conference on Computational Linguistics Owen Rambow editor Leo Wanner editor Marianna Apidianaki editor Hend Al-Khalifa editor Barbara Di Eugenio editor Steven Schockaert editor Association for Computational Linguistics Abu Dhabi, UAE conference publication wang-etal-2025-benchmark https://aclanthology.org/2025.coling-main.223/ 2025-01 3310 3328