ALT2023: Ancient Language Translation Workshop (2023)

Volumes

Proceedings of ALT2023: Ancient Language Translation Workshop 10 papers

pdf (full)
bib (full) Proceedings of ALT2023: Ancient Language Translation Workshop

pdf bib
Proceedings of ALT2023: Ancient Language Translation Workshop

This paper present the results of the First International Ancient Chinese Transalation Bakeoff (EvaHan), which is a shared task of the Ancient Language Translation Workshop (ALT2023) and a co-located event of the 19th Edition of the Machine Translation Summit 2023 (MTS 2023). We described the motivation for having an international shared contest, as well as the datasets and tracks. The contest consists of two modalities, closed and open. In the closed modality, the participants are only allowed to use the training data, the partic-ipating teams achieved the highest BLEU scores of 27.3315 and 1.1102 in the tasks of translating Ancient Chinese to Modern Chinese and translating Ancient Chinese to English, respectively. In the open mode, contestants can only use any available data and models. The participating teams achieved the highest BLEU scores of 29.6832 and 6.5493 in the ancient Chinese to modern and ancient Chinese to English tasks, respectively.

pdf bib abs
The Ups and Downs of Training RoBERTa-based models on Smaller Datasets for Translation Tasks from Classical Chinese into Modern Standard Mandarin and Modern English
Stuart Michael McManus | Roslin Liu | Yuji Li | Leo Tam | Stephanie Qiu | Letian Yu

The paper presents an investigation into the effectiveness of pre-trained language models, Siku-RoBERTa and RoBERTa, for Classical Chinese to Modern Standard Mandarin and Classical Chinese to English translation tasks. The English translation model resulted in unsatisfactory performance due to the small dataset, while the Modern Standard Mandarin model gave reasonable results.

pdf bib abs
Pre-trained Model In Ancient-Chinese-to-Modern-Chinese Machine Translation
Jiahui Wang | Xuqin Zhang | Jiahuan Li | Shujian Huang

This paper presents an analysis of the pre-trained Transformer model Neural Machine Translation (NMT) for the Ancient-Chinese-to-Modern-Chinese machine translation task.

pdf bib abs
Some Trials on Ancient Modern Chinese Translation
Li Lin | Xinyu Hu

In this study, we explored various neural machine translation techniques for the task of translating ancient Chinese into modern Chinese. Our aim was to find an effective method for achieving accurate and reliable translation results. After experimenting with different approaches, we discovered that the method of concatenating adjacent sentences yielded the best performance among all the methods tested.

pdf bib abs
Istic Neural Machine Translation System for EvaHan 2023
Ningyuan Deng | Shuao Guo | Yanqing He

This paper presents the system architecture and the technique details adopted by Institute of Scientific and Technical Information of China (ISTIC) in the evaluation of First Conference on EvaHan(2023). In this evaluation, ISTIC participated in two tasks of Ancient Chinese Machine Translation: Ancient Chinese to Modern Chinese and Ancient Chinese to English. The paper mainly elaborates the model framework and data processing methods adopted in ISTIC’s system. Finally a comparison and analysis of different machine translation systems are also given.

pdf bib abs
BIT-ACT: An Ancient Chinese Translation System Using Data Augmentation
Li Zeng | Yanzhi Tian | Yingyu Shan | Yuhang Guo

This paper describes a translation model for ancient Chinese to modern Chinese and English for the Evahan 2023 competition, a subtask of the Ancient Language Translation 2023 challenge. During the training of our model, we applied various data augmentation techniques and used SiKu-RoBERTa as part of our model architecture. The results indicate that back translation improves the model’s performance, but double back translation introduces noise and harms the model’s performance. Fine-tuning on the original dataset can be helpful in solving the issue.

pdf bib abs
Technical Report on Ancient Chinese Machine Translation Based on mRASP Model
Wenjing Liu | Jing Xie

Abstract: Objective This paper aims to improve the performance of machine translation of ancient Chinese classics, which can better promote the development of ancient books research and the spread of Chinese culture. Methods Based on the multilingual translation machine pre-training model of mRASP, the model was trained by fine-tuning the specific language pairs, namely a2m, and a2e, according to the two downstream tasks of classical Chinese translation into modern Chinese and classical Chinese translation into English, using the parallel corpus of ancient white and white and ancient English parallel corpus of Pre-Qin+ZiZhiTongJian, and the translation performance of the fine-tuning model was evaluated by BIEU evaluation index. Results The BIEU4 results of the three downstream tasks of 24_histories_a2m、Pre-Qin+ZiZhiTongJian_a2m、 Pre-Qin+ZiZhiTongJian_a2e were 17.38, 13.69 and 12.90 respectively.

pdf bib abs
AnchiLm: An Effective Classical-to-Modern Chinese Translation Model Leveraging bpe-drop and SikuRoBERTa
Jiahui Zhu | Sizhou Chen

In this paper, we present our submitted model for translating ancient to modern texts, which ranked sixth in the closed track of ancient Chinese in the 2nd International Review of Automatic Analysis of Ancient Chinese (EvaHan). Specifically, we employed two strategies to improve the translation from ancient to modern texts. First, we used bpe-drop to enhance the parallel corpus. Second, we use SikuRoBERTa to simultaneously initialize the translation model’s codec and reconstruct the bpe word list. In our experiments, we compare the baseline model, rdrop, pre-trained model, and parameter initialization methods. The experimental results show that the parameter initialization method in this paper significantly outperforms the baseline model in terms of performance, and its BLEU score reaches 21.75.

pdf bib abs
Translating Ancient Chinese to Modern Chinese at Scale: A Large Language Model-based Approach
Jiahuan Cao | Dezhi Peng | Yongxin Shi | Zongyuan Jiang | Lianwen Jin

Recently, the emergence of large language models (LLMs) has provided powerful foundation models for a wide range of natural language processing (NLP) tasks. However, the vast majority of the pre-training corpus for most existing LLMs is in English, resulting in their Chinese proficiency falling far behind that of English. Furthermore, ancient Chinese has a much larger vocabulary and less available corpus than modern Chinese, which significantly challenges the generalization capacity of existing LLMs. In this paper, we investigate the Ancient-Chinese-to-Modern-Chinese (A2M) translation using LLMs including LLaMA and Ziya. Specifically, to improve the understanding of Chinese texts, we explore the vocabulary expansion and incremental pre-training methods based on existing pre-trained LLMs. Subsequently, a large-scale A2M translation dataset with 4M pairs is utilized to finetune the LLMs.Experimental results demonstrate the effectiveness of the proposed method, especially with Ziya-13B, in translating ancient Chinese to modern Chinese. Moreover,we deeply analyze the performance of various LLMs with different strategies, which we believe can benefit further research on LLM-based A2M approaches.