ACSE: An Ancient Character Semantic-Aware Embedding for Large Language Models

Zhihan Zhou; Daqian Shi; Lida Shi; Rui Song; Peiqiang Qiu; Xiaolei Diao; Hao Xu

ACSE: An Ancient Character Semantic-Aware Embedding for Large Language Models

Zhihan Zhou, Daqian Shi, Lida Shi, Rui Song, Peiqiang Qiu, Xiaolei Diao, Hao Xu

Abstract

Research on ancient Chinese language is of great significance for tracing Chinese history and civilization. In the field of large language models, studies on the pre-Qin excavated documents such as Oracle Bone Inscriptions, Bronze Inscriptions, and Bamboo Book of Chu remain insufficient. This is because these ancient characters have a low level of digitization, training corpora are extremely scarce, and they typically contain complex and rich semantic information. Therefore, we propose an ancient character semantic-aware embedding for large language models. This embedding integrates both the glyph and lexicality of ancient characters and maps them to the modern Chinese semantic space. We also design a two-stage method for lightweight and parameter-efficient training of the embedding. Finally, we conduct extensive experiments on excavated documents from the pre-Qin period, and the results demonstrate the effectiveness of our approach.

Anthology ID:: 2026.findings-acl.437
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9000–9012
Language:
URL:: https://aclanthology.org/2026.findings-acl.437/
DOI:
Bibkey:
Cite (ACL):: Zhihan Zhou, Daqian Shi, Lida Shi, Rui Song, Peiqiang Qiu, Xiaolei Diao, and Hao Xu. 2026. ACSE: An Ancient Character Semantic-Aware Embedding for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 9000–9012, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ACSE: An Ancient Character Semantic-Aware Embedding for Large Language Models (Zhou et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.437.pdf
Checklist:: 2026.findings-acl.437.checklist.pdf

PDF Cite Search Checklist Fix data