CloudSheep System for WMT24 Discourse-Level Literary Translation

Lisa Liu, Ryan Liu, Angela Tsai, Jingbo Shang


Abstract
This paper describes the CloudSheep translation system for WMT24 Discourse-Level Literary Translation shared task. We participated in the Chinese-English direction on the unconstrained track. Our approach to the task used a pipeline of different tools in order to maximize the translation accuracy and flow of the text by combining the strengths of each tool. In particular, our focus was to translate names consistently and idioms correctly. To achieve consistent names throughout a text, a custom name dictionary was generated for each text, containing person and place names, along with their translations. A common honorific dictionary was applied for consistency with titles, especially in historical or cultivation novels. The names were found and translated with GPT 3.5-turbo. To achieve accurate and concise translations of idioms, which are often translated literally and verbosely, we integrated the CC-CEDICT library to provide official definitions. Then, we used GPT-4 to pick the best dictionary definition that fit the context and rephrase it to fit grammatically within a sentence. For the translation of non-name and non-idiom terms, we used Google Translate. We compared our approach’s performance with Google Translate as a baseline using BLEU, chrF, and COMET, as well as A/B testing.
Anthology ID:
2024.wmt-1.95
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
960–966
Language:
URL:
https://aclanthology.org/2024.wmt-1.95
DOI:
Bibkey:
Cite (ACL):
Lisa Liu, Ryan Liu, Angela Tsai, and Jingbo Shang. 2024. CloudSheep System for WMT24 Discourse-Level Literary Translation. In Proceedings of the Ninth Conference on Machine Translation, pages 960–966, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
CloudSheep System for WMT24 Discourse-Level Literary Translation (Liu et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.95.pdf