Collection of a Simultaneous Translation Corpus for Comparative Analysis

Hiroaki Shimizu; Graham Neubig; Sakriani Sakti; Tomoki Toda; Satoshi Nakamura

Collection of a Simultaneous Translation Corpus for Comparative Analysis

Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

Abstract

This paper describes the collection of an English-Japanese/Japanese-English simultaneous interpretation corpus. There are two main features of the corpus. The first is that professional simultaneous interpreters with different amounts of experience cooperated with the collection. By comparing data from simultaneous interpretation of each interpreter, it is possible to compare better interpretations to those that are not as good. The second is that for part of our corpus there are already translation data available. This makes it possible to compare translation data with simultaneous interpretation data. We recorded the interpretations of lectures and news, and created time-aligned transcriptions. A total of 387k words of transcribed data were collected. The corpus will be helpful to analyze differences in interpretations styles and to construct simultaneous interpretation systems.

Anthology ID:: L14-1178
Volume:: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:: May
Year:: 2014
Address:: Reykjavik, Iceland
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 670–673
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/162_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2014. Collection of a Simultaneous Translation Corpus for Comparative Analysis. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 670–673, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):: Collection of a Simultaneous Translation Corpus for Comparative Analysis (Shimizu et al., LREC 2014)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/162_Paper.pdf

PDF Cite Search Fix data