LitPC: A set of tools for building parallel corporafrom literary works

Antoni Oliver, Sergi Alvarez-Vidal


Abstract
In this paper, we describe the LitPC toolkit, a variety of tools and methods designed for the quick and effective creation of parallel corpora derived from literary works. This toolkit can be a useful resource due to the scarcity of curated parallel texts for this domain. We also feature a case study describing the creation of a Russian-English parallel corpus based on the literary works by Leo Tolstoy. Furthermore, an augmented version of this corpus is used to both train and assess neural machine translation systems specifically adapted to the author’s style.
Anthology ID:
2024.ctt-1.3
Volume:
Proceedings of the 1st Workshop on Creative-text Translation and Technology
Month:
June
Year:
2024
Address:
Sheffield, United Kingdom
Editors:
Bram Vanroy, Marie-Aude Lefer, Lieve Macken, Paola Ruffo
Venues:
CTT | WS
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
21–31
Language:
URL:
https://aclanthology.org/2024.ctt-1.3
DOI:
Bibkey:
Cite (ACL):
Antoni Oliver and Sergi Alvarez-Vidal. 2024. LitPC: A set of tools for building parallel corporafrom literary works. In Proceedings of the 1st Workshop on Creative-text Translation and Technology, pages 21–31, Sheffield, United Kingdom. European Association for Machine Translation.
Cite (Informal):
LitPC: A set of tools for building parallel corporafrom literary works (Oliver & Alvarez-Vidal, CTT-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ctt-1.3.pdf