Segmentation and punctuation prediction in speech language translation using a monolingual translation system

Eunah Cho, Jan Niehues, Alex Waibel


Abstract
In spoken language translation (SLT), finding proper segmentation and reconstructing punctuation marks are not only significant but also challenging tasks. In this paper we present our recent work on speech translation quality analysis for German-English by improving sentence segmentation and punctuation. From oracle experiments, we show an upper bound of translation quality if we had human-generated segmentation and punctuation on the output stream of speech recognition systems. In our oracle experiments we gain 1.78 BLEU points of improvements on the lecture test set. We build a monolingual translation system from German to German implementing segmentation and punctuation prediction as a machine translation task. Using the monolingual translation system we get an improvement of 1.53 BLEU points on the lecture test set, which is a comparable performance against the upper bound drawn by the oracle experiments.
Anthology ID:
2012.iwslt-papers.15
Volume:
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
Month:
December 6-7
Year:
2012
Address:
Hong Kong, Table of contents
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
252–259
Language:
URL:
https://aclanthology.org/2012.iwslt-papers.15
DOI:
Bibkey:
Cite (ACL):
Eunah Cho, Jan Niehues, and Alex Waibel. 2012. Segmentation and punctuation prediction in speech language translation using a monolingual translation system. In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, pages 252–259, Hong Kong, Table of contents.
Cite (Informal):
Segmentation and punctuation prediction in speech language translation using a monolingual translation system (Cho et al., IWSLT 2012)
Copy Citation:
PDF:
https://aclanthology.org/2012.iwslt-papers.15.pdf