Mingbo Song


2025

pdf bib
WIKIGENBENCH:Exploring Full-length Wikipedia Generation under Real-World Scenario
Jiebin Zhang | Eugene J. Yu | Qinyu Chen | Chenhao Xiong | Dawei Zhu | Han Qian | Mingbo Song | Weimin Xiong | Xiaoguang Li | Qun Liu | Sujian Li
Proceedings of the 31st International Conference on Computational Linguistics

It presents significant challenges to generate comprehensive and accurate Wikipedia articles for newly emerging events under real-world scenario. Existing attempts fall short either by focusing only on short snippets or by using metrics that are insufficient to evaluate real-world scenarios. In this paper, we construct WIKIGENBENCH, a new benchmark consisting of 1,320 entries, designed to align with real-world scenarios in both generation and evaluation. For generation, we explore a real-world scenario where structured, full-length Wikipedia articles with citations are generated for new events using input documents from web sources. For evaluation, we integrate systematic metrics and LLM-based metrics to assess the verifiability, organization, and other aspects aligned with real-world scenarios. Based on this benchmark, we conduct extensive experiments using various models within three commonly used frameworks: direct RAG, hierarchical structure-based RAG, and RAG with fine-tuned generation model. Experimental results show that hierarchical-based methods can generate more comprehensive content, while fine-tuned methods achieve better verifiability. However, even the best methods still show a significant gap compared to existing Wikipedia content, indicating that further research is necessary.