Nina Grønnum
2006
DanPASS - A Danish Phonetically Annotated Spontaneous Speech Corpus
Nina Grønnum
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
A corpus is described consisting of non-scripted monologues and dialogues, recorded by 22 speakers, comprising a total of about 70.000 words, corresponding to well over 10 hours of speech. The monologues were recorded as one-way communication with blind partner where the speaker performed three different tasks: (S)he described a network consisting of various geometrical shapes in various colours. (S)he guided the listener through four different routes in a virtual city map.(S)he instructed the listener how to build a house from its individual parts. The dialogues are replicas of the HCRC map tasks (http://www.hcrc.ed.ac.uk/maptask/). Annotation is performed in Praat. The sound files are segmented into prosodic phrases, words, and syllables, always to the nearest zero-crossing in the waveform. It is supplied, in seven separate interval tiers, with an orthographical transcription, detailed part-of-speech tags, simplified part-of-speech tags, a phonological transcription, a broad phonetic transcription, the pitch relation between each stressed and post-tonic syllable, the phrasal intonation, and an empty tier for comments.