PoemBERT: A Dynamic Masking Content and Ratio Based Semantic Language Model For Chinese Poem Generation

Chihan Huang, Xiaobo Shen


Abstract
Ancient Chinese poetry stands as a crucial treasure in Chinese culture. To address the absence of pre-trained models for ancient poetry, we introduced PoemBERT, a BERT-based model utilizing a corpus of classical Chinese poetry. Recognizing the unique emotional depth and linguistic precision of poetry, we incorporated sentiment and pinyin embeddings into the model, enhancing its sensitivity to emotional information and addressing challenges posed by the phenomenon of multiple pronunciations for the same Chinese character. Additionally, we proposed Character Importance-based masking and dynamic masking strategies, significantly augmenting the model’s capability to extract imagery-related features and handle poetry-specific information. Fine-tuning our PoemBERT model on various downstream tasks, including poem generation and sentiment classification, resulted in state-of-the-art performance in both automatic and manual evaluations. We provided explanations for the selection of the dynamic masking rate strategy and proposed a solution to the issue of a small dataset size.
Anthology ID:
2025.coling-main.5
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
50–60
Language:
URL:
https://aclanthology.org/2025.coling-main.5/
DOI:
Bibkey:
Cite (ACL):
Chihan Huang and Xiaobo Shen. 2025. PoemBERT: A Dynamic Masking Content and Ratio Based Semantic Language Model For Chinese Poem Generation. In Proceedings of the 31st International Conference on Computational Linguistics, pages 50–60, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
PoemBERT: A Dynamic Masking Content and Ratio Based Semantic Language Model For Chinese Poem Generation (Huang & Shen, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.5.pdf