GenWebNovel: A Genre-oriented Corpus of Entities in Chinese Web Novels

Hanjie Zhao, Yuchen Yan, Senbin Zhu, Hongde Liu, Yuxiang Jia, Hongying Zan, Min Peng


Abstract
Entities are important to understanding literary works, which emphasize characters, plots and environment. The research on entity recognition, especially nested entity recognition in the literary domain is still insufficient partly due to insufficient annotated data. To address this issue, we construct the first Genre-oriented Corpus for Entity Recognition in Chinese Web Novels, namely GenWebNovel, comprising 400 chapters totaling 1,214,283 tokens under two genres, XuanHuan (Eastern Fantasy) and History. Based on the corpus, we analyze the distribution of different types of entities, including person, location, and organization. We also compare the nesting patterns of nested entities between GenWebNovel and the English corpus LitBank. Even though both belong to the literary domain, entities in different genres share few overlaps, making genre adaptation of NER (Named Entity Recognition) a hard problem. We propose a novel method that utilizes a pre-trained language model as an In-context learning example retriever to boost the performance of large language models. Our experiments show that this approach significantly enhances entity recognition, matching state-of-the-art (SOTA) models without requiring additional training data. Our code, dataset, and model are available at https://github.com/hjzhao73/GenWebNovel.
Anthology ID:
2025.coling-main.259
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3836–3849
Language:
URL:
https://aclanthology.org/2025.coling-main.259/
DOI:
Bibkey:
Cite (ACL):
Hanjie Zhao, Yuchen Yan, Senbin Zhu, Hongde Liu, Yuxiang Jia, Hongying Zan, and Min Peng. 2025. GenWebNovel: A Genre-oriented Corpus of Entities in Chinese Web Novels. In Proceedings of the 31st International Conference on Computational Linguistics, pages 3836–3849, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
GenWebNovel: A Genre-oriented Corpus of Entities in Chinese Web Novels (Zhao et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.259.pdf