Polish Rhythmic Database ― New Resources for Speech Timing and Rhythm Analysis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper reports on a new database ― Polish rhythmic database and tools developed with the aim of investigating timing phenomena and rhythmic structure in Polish including topics such as, inter alia, the effect of speaking style and tempo on timing patterns, phonotactic and phrasal properties of speech rhythm and stability of rhythm metrics. So far, 19 native and 12 non-native speakers with different first languages have been recorded. The collected speech data (5 h 14 min.) represents five different speaking styles and five different tempi. For the needs of speech corpus management, annotation and analysis, a database was developed and integrated with Annotation Pro (Klessa et al., 2013, Klessa, 2016). Currently, the database is the only resource for Polish which allows for a systematic study of a broad range of phenomena related to speech timing and rhythm. The paper also introduces new tools and methods developed to facilitate the database annotation and analysis with respect to various timing and rhythm measures. In the end, the results of an ongoing research and first experimental results using the new resources are reported and future work is sketched.
Developing and evaluating an emergency scenario dialogue corpus
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The present paper describes the development and evaluation of the Polish emergency dialogue corpus recorded for studying alignment phenomena in stress scenarios. The challenge is that emergency dialogues are more complex on many levels than standard information negotiation dialogues, different resources are needed for differential investigation, and resources for this kind of corpus are rare. Currently there is no comparable corpus for Polish. In the present context, alignment is meant as adaptation on the syntactic, semantic and pragmatic levels of communication between the two interlocutors, including choice of similar lexical items and speaking style. Four different dialogue scenarios were arranged and prompt speech material was created. Two maps for the map-tasks and one emergency diapix were design to prompt semi-spontaneous dialogues simulating stress and natural communicative situations. The dialogue corpus was recorded taking into account the public character of conversations in the emergency setting. The linguistic study of alignment in this kind of dialogue made it possible to design and implement a prototype of a Polish adaptive dialogue system to support stress scenario communication (not described in this paper).
An Automatic Close Copy Speech Synthesis Tool for Large-Scale Speech Corpus Evaluation
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The production of rich multilingual speech corpus resources on a large scale is a requirement for many linguistic, phonetic and technological tasks, in both research and application domains. It is also time-consuming and therefore expensive. The human component in the resource creation process is also prone to inconsistencies, a situation frequently documented in cross-transcriber consistency studies. In the present case, corpora of three languages were to be evaluated and corrected: (1) Polish, a large automatically annotated and manually corrected single-speaker TTS unit-selection corpus in the BOSS Label File (BLF) format, (2) German and (3) English, the second and third being manually annotated multi-speaker story-telling learner corpora in Praat TextGrid format. A method is provided for supporting the evaluation and correction of time-aligned annotations for the three corpora by permitting a rapid audio screening of the annotations by an expert listener for the detection of perceptually conspicuous systematic or isolated errors in the annotations. The criterion for perceptual conspicuousness was provided by converting the annotation formats into the interface format required by the MBROLA speech synthesiser. The audio screening procedure is complementary to other methods of corpus evaluation and does not replace them.