Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies

Bruno Cartoni, Thomas Meyer


Abstract
Translation studies rely more and more on corpus data to examine specificities of translated texts, that can be translated from different original languages and compared to original texts. In parallel, more and more multilingual corpora are becoming available for various natural language processing tasks. This paper questions the use of these multilingual corpora in translation studies and shows the methodological steps needed in order to obtain more reliably comparable sub-corpora that consist of original and directly translated text only. Various experiments are presented that show the advantage of directional sub-corpora.
Anthology ID:
L12-1050
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2132–2137
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/188_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Bruno Cartoni and Thomas Meyer. 2012. Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2132–2137, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies (Cartoni & Meyer, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/188_Paper.pdf