Sentence Segmentation and Sentence Punctuation Based on XunziALLM

Zihong Chen


Abstract
In ancient Chinese books, punctuation marks are typically absent in engraved texts. Sentence segmentation and punctuation heavily rely on the meticulous efforts of experts and scholars. Therefore, the work of automatic punctuation and sentence segmentation plays a very important role in promoting ancient books, as well as the inheritance of Chinese culture. In this paper, we present a method for fine-tuning downstream tasks for large language model using the LoRA approach, leveraging the EvaHan2024 dataset. This method ensures robust output and high accuracy while inheriting the knowledge from the large pre-trained language model Xunzi.
Anthology ID:
2024.lt4hala-1.30
Volume:
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Rachele Sprugnoli, Marco Passarotti
Venues:
LT4HALA | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
246–250
Language:
URL:
https://aclanthology.org/2024.lt4hala-1.30
DOI:
Bibkey:
Cite (ACL):
Zihong Chen. 2024. Sentence Segmentation and Sentence Punctuation Based on XunziALLM. In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pages 246–250, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Sentence Segmentation and Sentence Punctuation Based on XunziALLM (Chen, LT4HALA-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lt4hala-1.30.pdf