A Preliminary Study of RAG for Taiwanese Historical Archives

Claire Lin, Bo-Han Feng, Xuanjun Chen, Te-Lun Yang, Hung-Yi Lee, Jyh-Shing Roger Jang


Abstract
Retrieval-Augmented Generation (RAG) has emerged as a promising approach for knowledge-intensive tasks. However, few studies have examined RAG for Taiwanese Historical Archives. In this paper, we present an initial study of a RAG pipeline applied to two historical Traditional Chinese datasets, Fort Zeelandia and the Taiwan Provincial Council Gazette, along with their corresponding open-ended query sets. We systematically investigate the effects of query characteristics and metadata integration strategies on retrieval quality, answer generation, and the performance of the overall system. The results show that early-stage metadata integration enhances both retrieval and answer accuracy while also revealing persistent challenges for RAG systems, including hallucinations during generation and difficulties in handling temporal or multi-hop historical queries.
Anthology ID:
2025.rocling-main.6
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
45–62
Language:
URL:
https://aclanthology.org/2025.rocling-main.6/
DOI:
Bibkey:
Cite (ACL):
Claire Lin, Bo-Han Feng, Xuanjun Chen, Te-Lun Yang, Hung-Yi Lee, and Jyh-Shing Roger Jang. 2025. A Preliminary Study of RAG for Taiwanese Historical Archives. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 45–62, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
A Preliminary Study of RAG for Taiwanese Historical Archives (Lin et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.6.pdf