2024
pdf
bib
abs
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
|
Seanie Lee
|
Ilja Baumann
|
Philipp Seeberger
|
Korbinian Riedhammer
|
Tobias Bocklet
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
In this work, we optimize speculative sampling for parallel hardware accelerators to improve sampling speed. We notice that substantial portions of the intermediate matrices necessary for speculative sampling can be computed concurrently. This allows us to distribute the workload across multiple GPU threads, enabling simultaneous operations on matrix segments within thread blocks. This results in profiling time improvements ranging from 6% to 13% relative to the baseline implementation, without compromising accuracy. To further accelerate speculative sampling, probability distributions parameterized by softmax are approximated by sigmoid. This approximation approach results in significantly greater relative improvements in profiling time, ranging from 37% to 94%, with a minor decline in accuracy. We conduct extensive experiments on both automatic speech recognition and summarization tasks to validate the effectiveness of our optimization methods.
pdf
bib
abs
MMUTF: Multimodal Multimedia Event Argument Extraction with Unified Template Filling
Philipp Seeberger
|
Dominik Wagner
|
Korbinian Riedhammer
Findings of the Association for Computational Linguistics: EMNLP 2024
With the advancement of multimedia technologies, news documents and user-generated content are often represented as multiple modalities, making Multimedia Event Extraction (MEE) an increasingly important challenge. However, recent MEE methods employ weak alignment strategies and data augmentation with simple classification models, which ignore the capabilities of natural language-formulated event templates for the challenging Event Argument Extraction (EAE) task. In this work, we focus on EAE and address this issue by introducing a unified template filling model that connects the textual and visual modalities via textual prompts. This approach enables the exploitation of cross-ontology transfer and the incorporation of event-specific semantics. Experiments on the M2E2 benchmark demonstrate the effectiveness of our approach. Our system surpasses the current SOTA on textual EAE by +7% F1, and performs generally better than the second-best systems for multimedia EAE.
2023
pdf
bib
Information Type Classification with Contrastive Task-Specialized Sentence Encoders
Philipp Seeberger
|
Tobias Bocklet
|
Korbinian Riedhammer
Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)
2022
pdf
bib
abs
KSoF: The Kassel State of Fluency Dataset – A Therapy Centered Dataset of Stuttering
Sebastian Bayerl
|
Alexander Wolff von Gudenberg
|
Florian Hönig
|
Elmar Noeth
|
Korbinian Riedhammer
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Stuttering is a complex speech disorder that negatively affects an individual’s ability to communicate effectively. Persons who stutter (PWS) often suffer considerably under the condition and seek help through therapy. Fluency shaping is a therapy approach where PWSs learn to modify their speech to help them to overcome their stutter. Mastering such speech techniques takes time and practice, even after therapy. Shortly after therapy, success is evaluated highly, but relapse rates are high. To be able to monitor speech behavior over a long time, the ability to detect stuttering events and modifications in speech could help PWSs and speech pathologists to track the level of fluency. Monitoring could create the ability to intervene early by detecting lapses in fluency. To the best of our knowledge, no public dataset is available that contains speech from people who underwent stuttering therapy that changed the style of speaking. This work introduces the Kassel State of Fluency (KSoF), a therapy-based dataset containing over 5500 clips of PWSs. The clips were labeled with six stuttering-related event types: blocks, prolongations, sound repetitions, word repetitions, interjections, and – specific to therapy – speech modifications. The audio was recorded during therapy sessions at the Institut der Kasseler Stottertherapie. The data will be made available for research purposes upon request.
pdf
bib
abs
Annotation of Valence Unfolding in Spoken Personal Narratives
Aniruddha Tammewar
|
Franziska Braun
|
Gabriel Roccabruna
|
Sebastian Bayerl
|
Korbinian Riedhammer
|
Giuseppe Riccardi
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Personal Narrative (PN) is the recollection of individuals’ life experiences, events, and thoughts along with the associated emotions in the form of a story. Compared to other genres such as social media texts or microblogs, where people write about experienced events or products, the spoken PNs are complex to analyze and understand. They are usually long and unstructured, involving multiple and related events, characters as well as thoughts and emotions associated with events, objects, and persons. In spoken PNs, emotions are conveyed by changing the speech signal characteristics as well as the lexical content of the narrative. In this work, we annotate a corpus of spoken personal narratives, with the emotion valence using discrete values. The PNs are segmented into speech segments, and the annotators annotate them in the discourse context, with values on a 5-point bipolar scale ranging from -2 to +2 (0 for neutral). In this way, we capture the unfolding of the PNs events and changes in the emotional state of the narrator. We perform an in-depth analysis of the inter-annotator agreement, the relation between the label distribution w.r.t. the stimulus (positive/negative) used for the elicitation of the narrative, and compare the segment-level annotations to a baseline continuous annotation. We find that the neutral score plays an important role in the agreement. We observe that it is easy to differentiate the positive from the negative valence while the confusion with the neutral label is high. Keywords: Personal Narratives, Emotion Annotation, Segment Level Annotation
pdf
bib
abs
Enhancing Crisis-Related Tweet Classification with Entity-Masked Language Modeling and Multi-Task Learning
Philipp Seeberger
|
Korbinian Riedhammer
Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI)
Social media has become an important information source for crisis management and provides quick access to ongoing developments and critical information. However, classification models suffer from event-related biases and highly imbalanced label distributions which still poses a challenging task. To address these challenges, we propose a combination of entity-masked language modeling and hierarchical multi-label classification as a multi-task learning problem. We evaluate our method on tweets from the TREC-IS dataset and show an absolute performance gain w.r.t. F1-score of up to 10% for actionable information types. Moreover, we found that entity-masking reduces the effect of overfitting to in-domain events and enables improvements in cross-event generalization.
2014
pdf
bib
abs
Erlangen-CLP: A Large Annotated Corpus of Speech from Children with Cleft Lip and Palate
Tobias Bocklet
|
Andreas Maier
|
Korbinian Riedhammer
|
Ulrich Eysholdt
|
Elmar Nöth
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper we describe Erlangen-CLP, a large speech database of children with Cleft Lip and Palate. More than 800 German children with CLP (most of them between 4 and 18 years old) and 380 age matched control speakers spoke the semi-standardized PLAKSS test that consists of words with all German phonemes in different positions. So far 250 CLP speakers were manually transcribed, 120 of these were analyzed by a speech therapist and 27 of them by four additional therapists. The tharapists marked 6 different processes/criteria like pharyngeal backing and hypernasality which typically occur in speech of people with CLP. We present detailed statistics about the the marked processes and the inter-rater agreement.
2010
pdf
bib
abs
FAU IISAH Corpus – A German Speech Database Consisting of Human-Machine and Human-Human Interaction Acquired by Close-Talking and Far-Distance Microphones
Werner Spiegl
|
Korbinian Riedhammer
|
Stefan Steidl
|
Elmar Nöth
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
In this paper the FAU IISAH corpus and its recording conditions are described: a new speech database consisting of human-machine and human-human interaction recordings. Beside close-talking microphones for the best possible audio quality of the recorded speech, far-distance microphones were used to acquire the interaction and communication. The recordings took place during a Wizard-of-Oz experiment in the intelligent, senior-adapted house (ISA-House). That is a living room with a speech controlled home assistance system for elderly people, based on a dialogue system, which is able to process spontaneous speech. During the studies in the ISA-House more than eight hours of interaction data were recorded including 3 hours and 27 minutes of spontaneous speech. The data were annotated in terms of human-human (off-talk) and human-machine (on-talk) interaction. The test persons used 2891 turns of off-talk and 2752 turns of on-talk including 1751 different words. Still in progress is the analysis under statistical and linguistical aspects.