2014
pdf
bib
abs
Semi-automatic annotation of the UCU accents speech corpus
Rosemary Orr
|
Marijn Huijbregts
|
Roeland van Beek
|
Lisa Teunissen
|
Kate Backhouse
|
David van Leeuwen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Annotation and labeling of speech tasks in large multitask speech corpora is a necessary part of preparing a corpus for distribution. We address three approaches to annotation and labeling: manual, semi automatic and automatic procedures for labeling the UCU Accent Project speech data, a multilingual multitask longitudinal speech corpus. Accuracy and minimal time investment are the priorities in assessing the efficacy of each procedure. While manual labeling based on aural and visual input should produce the most accurate results, this approach is error-prone because of its repetitive nature. A semi automatic event detection system requiring manual rejection of false alarms and location and labeling of misses provided the best results. A fully automatic system could not be applied to entire speech recordings because of the variety of tasks and genres. However, it could be used to annotate separate sentences within a specific task. Acoustic confidence measures can correctly detect sentences that do not match the text with an EER of 3.3%
2006
pdf
bib
abs
The Dutch-Flemish HLT Programme STEVIN: Essential Speech and Language Technology Resources
Elisabeth D’Halleweyn
|
Jan Odijk
|
Lisanne Teunissen
|
Catia Cucchiarini
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In 2004 a consortium of ministries and organizations in the Netherlands and Flanders launched the comprehensive Dutch-Flemish HLT programme STEVIN (a Dutch acronym for Essential Speech and Language Technology Resources). To guarantee its Dutch-Flemish character, this large-scale programme is carried out under the auspices of the intergovernmental Dutch Language Union (NTU). The aim of STEVIN is to contribute to the further progress of HLT for the Dutch language, by raising awareness of HLT results, stimulating the demand of HLT products, promoting strategic research in HLT, and developing HLT resources that are essential and are known to be missing. Furthermore, a structure was set up for the management, maintenance and distribution of HLT resources. The STEVIN programme, which will run from 2004 to 2009, resulted from HLT activities in the Dutch language area, which were reported on at previous LREC conferences (2000, 2002, 2004). In this paper we will explain how different activities are combined in one comprehensive programme. We will show how cooperation can successfully be realized between different parties (language and speech technology, Flanders and the Netherlands, academia, industry and policy institutions) so as to achieve one common goal: progress in HLT.
2002
pdf
bib
A Human Language Technologies Platform for the Dutch language: awareness, management maintenance and distribution
Catia Cucchiarini
|
Elisabeth D’Halleweyn
|
Lisanne Teunissen
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)