Hiroaki Shimizu
2014
Collection of a Simultaneous Translation Corpus for Comparative Analysis
Hiroaki Shimizu
|
Graham Neubig
|
Sakriani Sakti
|
Tomoki Toda
|
Satoshi Nakamura
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes the collection of an English-Japanese/Japanese-English simultaneous interpretation corpus. There are two main features of the corpus. The first is that professional simultaneous interpreters with different amounts of experience cooperated with the collection. By comparing data from simultaneous interpretation of each interpreter, it is possible to compare better interpretations to those that are not as good. The second is that for part of our corpus there are already translation data available. This makes it possible to compare translation data with simultaneous interpretation data. We recorded the interpretations of lectures and news, and created time-aligned transcriptions. A total of 387k words of transcribed data were collected. The corpus will be helpful to analyze differences in interpretations styles and to construct simultaneous interpretation systems.
2013
Constructing a speech translation system using simultaneous interpretation data
Hiroaki Shimizu
|
Graham Neubig
|
Sakriani Sakti
|
Tomoki Toda
|
Satoshi Nakamura
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers
There has been a fair amount of work on automatic speech translation systems that translate in real-time, serving as a computerized version of a simultaneous interpreter. It has been noticed in the field of translation studies that simultaneous interpreters perform a number of tricks to make the content easier to understand in real-time, including dividing their translations into small chunks, or summarizing less important content. However, the majority of previous work has not specifically considered this fact, simply using translation data (made by translators) for learning of the machine translation system. In this paper, we examine the possibilities of additionally incorporating simultaneous interpretation data (made by simultaneous interpreters) in the learning process. First we collect simultaneous interpretation data from professional simultaneous interpreters of three levels, and perform an analysis of the data. Next, we incorporate the simultaneous interpretation data in the learning of the machine translation system. As a result, the translation style of the system becomes more similar to that of a highly experienced simultaneous interpreter. We also find that according to automatic evaluation metrics, our system achieves performance similar to that of a simultaneous interpreter that has 1 year of experience.
2012
Minimum Bayes-risk decoding extended with similar examples: NAIST-NCT at IWSLT 2012
Hiroaki Shimizu
|
Masao Utiyama
|
Eiichiro Sumita
|
Satoshi Nakamura
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes our methods used in the NAIST-NICT submission to the International Workshop on Spoken Language Translation (IWSLT) 2012 evaluation campaign. In particular, we propose two extensions to minimum bayes-risk decoding which reduces a expected loss.
Search
Fix data
Co-authors
- Satoshi Nakamura 3
- Graham Neubig 2
- Sakriani Sakti 2
- Tomoki Toda 2
- Eiichiro Sumita 1
- show all...