Kosuke Doi


2022

pdf bib
NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022
Ryo Fukuda | Yuka Ko | Yasumasa Kano | Kosuke Doi | Hirotaka Tokuyama | Sakriani Sakti | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper describes NAIST’s simultaneous speech translation systems developed for IWSLT 2022 Evaluation Campaign. We participated the speech-to-speech track for English-to-German and English-to-Japanese. Our primary submissions were end-to-end systems using adaptive segmentation policies based on Prefix Alignment.

2021

pdf bib
NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task
Ryo Fukuda | Yui Oka | Yasumasa Kano | Yuki Yano | Yuka Ko | Hirotaka Tokuyama | Kosuke Doi | Sakriani Sakti | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes NAIST’s system for the English-to-Japanese Simultaneous Text-to-text Translation Task in IWSLT 2021 Evaluation Campaign. Our primary submission is based on wait-k neural machine translation with sequence-level knowledge distillation to encourage literal translation.

pdf bib
Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data
Kosuke Doi | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of experience. Some of the SI data were manually aligned with the source speeches at the sentence level. Their latency, quality, and word order aspects were compared among the SI data themselves as well as against offline translations. The results showed that (1) interpreters with more experience controlled the latency and quality better, and (2) large latency hurt the SI quality.