Building an English-Chinese Parallel Corpus Annotated with Sub-sentential Translation Techniques

Yuming Zhai, Lufei Liu, Xinyi Zhong, Gbariel Illouz, Anne Vilnat


Abstract
Human translators often resort to different non-literal translation techniques besides the literal translation, such as idiom equivalence, generalization, particularization, semantic modulation, etc., especially when the source and target languages have different and distant origins. Translation techniques constitute an important subject in translation studies, which help researchers to understand and analyse translated texts. However, they receive less attention in developing Natural Language Processing (NLP) applications. To fill this gap, one of our long term objectives is to have a better semantic control of extracting paraphrases from bilingual parallel corpora. Based on this goal, we suggest this hypothesis: it is possible to automatically recognize different sub-sentential translation techniques. For this original task, since there is no dedicated data set for English-Chinese, we manually annotated a parallel corpus of eleven genres. Fifty sentence pairs for each genre have been annotated in order to consolidate our annotation guidelines. Based on this data set, we conducted an experiment to classify between literal and non-literal translations. The preliminary results confirm our hypothesis. The corpus and code are available. We hope that this annotated corpus will be useful for linguistic contrastive studies and for fine-grained evaluation of NLP tasks, such as automatic word alignment and machine translation.
Anthology ID:
2020.lrec-1.496
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4024–4033
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.496
DOI:
Bibkey:
Cite (ACL):
Yuming Zhai, Lufei Liu, Xinyi Zhong, Gbariel Illouz, and Anne Vilnat. 2020. Building an English-Chinese Parallel Corpus Annotated with Sub-sentential Translation Techniques. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4024–4033, Marseille, France. European Language Resources Association.
Cite (Informal):
Building an English-Chinese Parallel Corpus Annotated with Sub-sentential Translation Techniques (Zhai et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.496.pdf