Litao Lin
2024
Multi-Model Classical Chinese Event Trigger Word Recognition Driven by Incremental Pre-training
Litao Lin
|
Mengcheng Wu
|
Xueying Shen
|
Jiaxin Zhou
|
Shiyan Ou
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“This paper addresses the task of identifying and classifying historical event trigger words in Classical Chinese, utilizing both small-scale and large-scale language models. Specifically, we selected the small-scale language model GujiBERT for intelligent processing of classical texts, and the large-scale language model Xunzi-Qwen-14b. Both models underwent continued pretraining and fine-tuning, resulting in GujiBERT-CHED-mlm and Xunzi-Qwen-14b-CHED. For the small-scale language model, we used a BiLSTM as the feature extraction module and a CRF as the decoding module, employing a sequence labeling paradigm to complete the evaluation experiments. For the large-scale language model, we optimized the prompt templates and used a sequence-to-sequence paradigm for evaluation experiments. Our experiments revealed that GujiBERT-BiLSTM-CRF achieved the best performance across all tasks, ranking fourth in overall performance among all participating teams. The large-scale language model demonstrated good semantic understanding abilities, reaching a preliminary usable level. Future research should focus on enhancing its ability to produce standardized outputs.”
2023
EvaHan2023: Overview of the First International Ancient Chinese Translation Bakeoff
Dongbo Wang
|
Litao Lin
|
Zhixiao Zhao
|
Wenhao Ye
|
Kai Meng
|
Wenlong Sun
|
Lianzhen Zhao
|
Xue Zhao
|
Si Shen
|
Wei Zhang
|
Bin Li
Proceedings of ALT2023: Ancient Language Translation Workshop
This paper present the results of the First International Ancient Chinese Transalation Bakeoff (EvaHan), which is a shared task of the Ancient Language Translation Workshop (ALT2023) and a co-located event of the 19th Edition of the Machine Translation Summit 2023 (MTS 2023). We described the motivation for having an international shared contest, as well as the datasets and tracks. The contest consists of two modalities, closed and open. In the closed modality, the participants are only allowed to use the training data, the partic-ipating teams achieved the highest BLEU scores of 27.3315 and 1.1102 in the tasks of translating Ancient Chinese to Modern Chinese and translating Ancient Chinese to English, respectively. In the open mode, contestants can only use any available data and models. The participating teams achieved the highest BLEU scores of 29.6832 and 6.5493 in the ancient Chinese to modern and ancient Chinese to English tasks, respectively.