An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation

Xiaolin Wang, Andrew Finch, Masao Utiyama, Eiichiro Sumita


Abstract
Simultaneous interpretation is a very challenging application of machine translation in which the input is a stream of words from a speech recognition engine. The key problem is how to segment the stream in an online manner into units suitable for translation. The segmentation process proceeds by calculating a confidence score for each word that indicates the soundness of placing a sentence boundary after it, and then heuristics are employed to determine the position of the boundaries. Multiple variants of the confidence scoring method and segmentation heuristics were studied. Experimental results show that the best performing strategy is not only efficient in terms of average latency per word, but also achieved end-to-end translation quality close to an offline baseline, and close to oracle segmentation.
Anthology ID:
W16-4613
Volume:
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Toshiaki Nakazawa, Hideya Mino, Chenchen Ding, Isao Goto, Graham Neubig, Sadao Kurohashi, Ir. Hammam Riza, Pushpak Bhattacharyya
Venue:
WAT
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
139–148
Language:
URL:
https://aclanthology.org/W16-4613
DOI:
Bibkey:
Cite (ACL):
Xiaolin Wang, Andrew Finch, Masao Utiyama, and Eiichiro Sumita. 2016. An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pages 139–148, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation (Wang et al., WAT 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4613.pdf