Microsoft Speech Language Translation (MSLT) Corpus: The IWSLT 2016 release for English, French and German

Christian Federmann, William D. Lewis


Abstract
We describe the Microsoft Speech Language Translation (MSLT) corpus, which was created in order to evaluate end-to-end conversational speech translation quality. The corpus was created from actual conversations over Skype, and we provide details on the recording setup and the different layers of associated text data. The corpus release includes Test and Dev sets with reference transcripts for speech recognition. Additionally, cleaned up transcripts and reference translations are available for evaluation of machine translation quality. The IWSLT 2016 release described here includes the source audio, raw transcripts, cleaned up transcripts, and translations to or from English for both French and German.
Anthology ID:
2016.iwslt-1.12
Volume:
Proceedings of the 13th International Conference on Spoken Language Translation
Month:
December 8-9
Year:
2016
Address:
Seattle, Washington D.C
Editors:
Mauro Cettolo, Jan Niehues, Sebastian Stüker, Luisa Bentivogli, Rolando Cattoni, Marcello Federico
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
International Workshop on Spoken Language Translation
Note:
Pages:
Language:
URL:
https://aclanthology.org/2016.iwslt-1.12
DOI:
Bibkey:
Cite (ACL):
Christian Federmann and William D. Lewis. 2016. Microsoft Speech Language Translation (MSLT) Corpus: The IWSLT 2016 release for English, French and German. In Proceedings of the 13th International Conference on Spoken Language Translation, Seattle, Washington D.C. International Workshop on Spoken Language Translation.
Cite (Informal):
Microsoft Speech Language Translation (MSLT) Corpus: The IWSLT 2016 release for English, French and German (Federmann & Lewis, IWSLT 2016)
Copy Citation:
PDF:
https://aclanthology.org/2016.iwslt-1.12.pdf