2019
pdf
bib
abs
A Scalable Method for Quantifying the Role of Pitch in Conversational Turn-Taking
Kornel Laskowski
|
Marcin Wlodarczak
|
Mattias Heldner
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
Pitch has long been held as an important signalling channel when planning and deploying speech in conversation, and myriad studies have been undertaken to determine the extent to which it actually plays this role. Unfortunately, these studies have required considerable human investment in data preparation and analysis, and have therefore often been limited to a handful of specific conversational contexts. The current article proposes a framework which addresses these limitations, by enabling a scalable, quantitative characterization of the role of pitch throughout an entire conversation, requiring only the raw signal and speech activity references. The framework is evaluated on the Switchboard dialogue corpus. Experiments indicate that pitch trajectories of both parties are predictive of their incipient speech activity; that pitch should be expressed on a logarithmic scale and Z-normalized, as well as accompanied by a binary voicing variable; and that only the most recent 400 ms of the pitch trajectory are useful in incipient speech activity prediction.
2016
pdf
bib
A framework for the automatic inference of stochastic turn-taking styles
Kornel Laskowski
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue
2010
pdf
bib
Modeling Norms of Turn-Taking in Multi-Party Conversation
Kornel Laskowski
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
pdf
bib
abs
A Snack Implementation and Tcl/Tk Interface to the Fundamental Frequency Variation Spectrum Algorithm
Kornel Laskowski
|
Jens Edlund
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Intonation is an important aspect of vocal production, used for a variety of communicative needs. Its modeling is therefore crucial in many speech understanding systems, particularly those requiring inference of speaker intent in real-time. However, the estimation of pitch, traditionally the first step in intonation modeling, is computationally inconvenient in such scenarios. This is because it is often, and most optimally, achieved only after speech segmentation and recognition. A consequence is that earlier speech processing components, in todays state-of-the-art systems, lack intonation awareness by fiat; it is not known to what extent this circumscribes their performance. In the current work, we present a freely available implementation of an alternative to pitch estimation, namely the computation of the fundamental frequency variation (FFV) spectrum, which can be easily employed at any level within a speech processing system. It is our hope that the implementation we describe aid in the understanding of this novel acoustic feature space, and that it facilitate its inclusion, as desired, in the front-end routines of speech recognition, dialog act recognition, and speaker recognition systems.
2008
pdf
bib
Modeling Vocal Interaction for Text-Independent Participant Characterization in Multi-Party Conversation
Kornel Laskowski
|
Mari Ostendorf
|
Tanja Schultz
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
pdf
bib
abs
A Comparative Cross-Domain Study of the Occurrence of Laughter in Meeting and Seminar Corpora
Susanne Burger
|
Kornel Laskowski
|
Matthias Woelfel
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Laughter is an intrinsic component of human-human interaction, and current automatic speech understanding paradigms stand to gain significantly from its detection and modeling. In the current work, we produce a manual segmentation of laughter in a large corpus of interactive multi-party seminars, which promises to be a valuable resource for acoustic modeling purposes. More importantly, we quantify the occurrence of laughter in this new domain, and contrast our observations with findings for laughter in multi-party meetings. Our analyses show that, with respect to the majority of measures we explore, the occurrence of laughter in both domains is quite similar.
2007
pdf
bib
A Geometric Interpretation of Non-Target-Normalized Maximum Cross-Channel Correlation for Vocal Activity Detection in Meetings
Kornel Laskowski
|
Tanja Schultz
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
pdf
bib
Modeling Vocal Interaction for Text-Independent Classification of Conversation Type
Kornel Laskowski
|
Mari Ostendorf
|
Tanja Schultz
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue
2006
pdf
bib
abs
Annotation and Analysis of Emotionally Relevant Behavior in the ISL Meeting Corpus
Kornel Laskowski
|
Susanne Burger
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
We present an annotation scheme for emotionally relevant behavior at the speaker contribution level in multiparty conversation. The scheme was applied to a large, publicly available meeting corpus by three annotators, and subsequently labeled with emotional valence. We report inter-labeler agreement statistics for the two schemes, and explore the correlation between speaker valence and behavior, as well as that between speaker valence and the previous speaker's behavior. Our analyses show that the co-occurrence of certain behaviors and valence classes significantly deviates from what is to be expected by chance; in isolated cases, behaviors are predictive of valence.
2002
pdf
bib
Improvements in Non-Verbal Cue Identification Using Multilingual Phone Strings
Tanja Schultz
|
Qin Jin
|
Kornel Laskowski
|
Alicia Tribble
|
Alex Waibel
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems