Susanne Burger


2024

The scarcity of public datasets for the summarization of medical conversations has been a limiting factor for advancing NLP research in the healthcare domain, and the structure of the existing data is largely limited to the simple format of conversation-summary pairs. We therefore propose a novel Incremental Note Generation (ING) annotation framework capable of greatly enriching summarization datasets in the healthcare domain and beyond. Our framework is designed to capture the human summarization process via an annotation task by instructing the annotators to first incrementally create a draft note as they accumulate information through a conversation transcript (Generation) and then polish the draft note into a reference note (Rewriting). The annotation results include both the reference note and a comprehensive editing history of the draft note in tabular format. Our pilot study on the task of SOAP note generation showed reasonable consistency between four expert annotators, established a solid baseline for quantitative targets of inter-rater agreement, and demonstrated the ING framework as an improvement over the traditional annotation process for future modeling of summarization.

2012

2011

2008

This paper describes in detail the data that was collected and annotated during the third and final year of the CHIL project. This data was used for the CLEAR evaluation campaign in spring 2007. The paper also introduces the CHIL Evaluation Package 2007 that resulted from this campaign including a complete description of the performed evaluation tasks. This evaluation package will be made available to the community through the ELRA General Catalogue.
Laughter is an intrinsic component of human-human interaction, and current automatic speech understanding paradigms stand to gain significantly from its detection and modeling. In the current work, we produce a manual segmentation of laughter in a large corpus of interactive multi-party seminars, which promises to be a valuable resource for acoustic modeling purposes. More importantly, we quantify the occurrence of laughter in this new domain, and contrast our observations with findings for laughter in multi-party meetings. Our analyses show that, with respect to the majority of measures we explore, the occurrence of laughter in both domains is quite similar.

2006

Recent improvements in speech recognition technology have resulted in products that can now demonstrate commercial value in a variety of applications. Many vendors are marketing products which combine ASR applications including continuous dictation, command-and-control interfaces, and transcription of recorded speech at an accuracy of 98%. In this study, we measured the accuracy of certain commercially available desktop speech recognition engines in multiple languages. Using word error rate as a benchmark, this work compares recognition accuracy across eight languages and the products of three manufacturers. Results show that two systems performed almost the same while a third system recognized at lower accuracy, although none of the systems reached the claimed accuracy. Read speech was recognized better than spontaneous speech. The systems for US-English, Japanese and Spanish showed higher accuracy than the systems for UK-English, German, French and Chinese.
We present an annotation scheme for emotionally relevant behavior at the speaker contribution level in multiparty conversation. The scheme was applied to a large, publicly available meeting corpus by three annotators, and subsequently labeled with emotional valence. We report inter-labeler agreement statistics for the two schemes, and explore the correlation between speaker valence and behavior, as well as that between speaker valence and the previous speaker's behavior. Our analyses show that the co-occurrence of certain behaviors and valence classes significantly deviates from what is to be expected by chance; in isolated cases, behaviors are predictive of valence.

2004

2003

When multilingual communication through a speech-to-speech translation system is supported by multimodal features, e.g. pen-based gestures, the following issues arise concerning the nature of the supported communication: a) to what extend does multilingual communication differ from ‘ordinary’ monolingual communication with respect to the dialogue structure and the communicative strategies used by participants; b) the patterns of integration between speech and gestures. Building on the outcomes of a previous work, we present results from a study aimed at addressing those issues. The initial findings confirm that multilingual communication, and the way in which it is realized by actual systems (e.g., with or without the push-to-talk mode) affects the form and structure of the conversation.

2002

2000

1999