The Development of the Multilingual LUNA Corpus for Spoken Language System Porting

Evgeny Stepanov, Giuseppe Riccardi, Ali Orkan Bayer


Abstract
The development of annotated corpora is a critical process in the development of speech applications for multiple target languages. While the technology to develop a monolingual speech application has reached satisfactory results (in terms of performance and effort), porting an existing application from a source language to a target language is still a very expensive task. In this paper we address the problem of creating multilingual aligned corpora and its evaluation in the context of a spoken language understanding (SLU) porting task. We discuss the challenges of the manual creation of multilingual corpora, as well as present the algorithms for the creation of multilingual SLU via Statistical Machine Translation (SMT).
Anthology ID:
L14-1613
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2675–2678
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/789_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Evgeny Stepanov, Giuseppe Riccardi, and Ali Orkan Bayer. 2014. The Development of the Multilingual LUNA Corpus for Spoken Language System Porting. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2675–2678, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
The Development of the Multilingual LUNA Corpus for Spoken Language System Porting (Stepanov et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/789_Paper.pdf