Mingbo Song
2025
WIKIGENBENCH:Exploring Full-length Wikipedia Generation under Real-World Scenario
Jiebin Zhang
|
Eugene J. Yu
|
Qinyu Chen
|
Chenhao Xiong
|
Dawei Zhu
|
Han Qian
|
Mingbo Song
|
Weimin Xiong
|
Xiaoguang Li
|
Qun Liu
|
Sujian Li
Proceedings of the 31st International Conference on Computational Linguistics
It presents significant challenges to generate comprehensive and accurate Wikipedia articles for newly emerging events under real-world scenario. Existing attempts fall short either by focusing only on short snippets or by using metrics that are insufficient to evaluate real-world scenarios. In this paper, we construct WIKIGENBENCH, a new benchmark consisting of 1,320 entries, designed to align with real-world scenarios in both generation and evaluation. For generation, we explore a real-world scenario where structured, full-length Wikipedia articles with citations are generated for new events using input documents from web sources. For evaluation, we integrate systematic metrics and LLM-based metrics to assess the verifiability, organization, and other aspects aligned with real-world scenarios. Based on this benchmark, we conduct extensive experiments using various models within three commonly used frameworks: direct RAG, hierarchical structure-based RAG, and RAG with fine-tuned generation model. Experimental results show that hierarchical-based methods can generate more comprehensive content, while fine-tuned methods achieve better verifiability. However, even the best methods still show a significant gap compared to existing Wikipedia content, indicating that further research is necessary.
Search
Fix data
Co-authors
- Qinyu Chen 1
- Xiaoguang Li 1
- Sujian Li (李素建) 1
- Qun Liu 1
- Han Qian 1
- show all...