Qinyu Chen
2025
WIKIGENBENCH:Exploring Full-length Wikipedia Generation under Real-World Scenario
Jiebin Zhang
|
Eugene J. Yu
|
Qinyu Chen
|
Chenhao Xiong
|
Dawei Zhu
|
Han Qian
|
Mingbo Song
|
Weimin Xiong
|
Xiaoguang Li
|
Qun Liu
|
Sujian Li
Proceedings of the 31st International Conference on Computational Linguistics
It presents significant challenges to generate comprehensive and accurate Wikipedia articles for newly emerging events under real-world scenario. Existing attempts fall short either by focusing only on short snippets or by using metrics that are insufficient to evaluate real-world scenarios. In this paper, we construct WIKIGENBENCH, a new benchmark consisting of 1,320 entries, designed to align with real-world scenarios in both generation and evaluation. For generation, we explore a real-world scenario where structured, full-length Wikipedia articles with citations are generated for new events using input documents from web sources. For evaluation, we integrate systematic metrics and LLM-based metrics to assess the verifiability, organization, and other aspects aligned with real-world scenarios. Based on this benchmark, we conduct extensive experiments using various models within three commonly used frameworks: direct RAG, hierarchical structure-based RAG, and RAG with fine-tuned generation model. Experimental results show that hierarchical-based methods can generate more comprehensive content, while fine-tuned methods achieve better verifiability. However, even the best methods still show a significant gap compared to existing Wikipedia content, indicating that further research is necessary.
2023
Exploring In-Context Learning for Knowledge Grounded Dialog Generation
Qinyu Chen
|
Wenhao Wu
|
Sujian Li
Findings of the Association for Computational Linguistics: EMNLP 2023
Large neural-based dialog generation models have been applied in many real-life scenarios, yet they are prone to hallucination and tend to produce factually inaccurate outputs which raise great concerns. To alleviate this problem, we propose a plug-and-play retrieval-based framework IKA, which leverages in-context learning and retrieval techniques to enhance LLMs on knowledge grounded dialog generation. We design thorough experiments on a large-scale knowledge graph with 1M+ facts to investigate the effectiveness and generalization of our framework. Experiments show that our method surpasses previous training-based SOTA by a large margin, specifically 46.67% in BLEU4, 26.01% in ROUGE-L, 122.90% in BARTScore and 30.50% in Entity Coverage F1. Further analysis show promising abilities of LLMs to perform knowledge-intensive tasks, which is previously considered weak and understudied.