Construction and Analysis of Word-level Time-aligned Simultaneous Interpretation Corpus

Takahiro Ono, Hitomi Tohyama, Shigeki Matsubara


Abstract
In this paper, quantitative analyses of the delay in Japanese-to-English (J-E) and English-to-Japanese (E-J) interpretations are described. The Simultaneous Interpretation Database of Nagoya University (SIDB) was used for the analyses. Beginning time and end time of each word were provided to the corpus using HMM-based phoneme segmentation, and the time lag between the corresponding words was calculated as the word-level delay. Word-level delay was calculated for 3,722 pairs and 4,932 pairs of words for J-E and E-J interpretations, respectively. The analyses revealed that J-E interpretation has much larger delay than E-J interpretation and that the difference of word order between Japanese and English affect the degree of delay.
Anthology ID:
L08-1490
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/860_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Takahiro Ono, Hitomi Tohyama, and Shigeki Matsubara. 2008. Construction and Analysis of Word-level Time-aligned Simultaneous Interpretation Corpus. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Construction and Analysis of Word-level Time-aligned Simultaneous Interpretation Corpus (Ono et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/860_paper.pdf