Sabrina Stehwien


2020

pdf bib
The Little Prince in 26 Languages: Towards a Multilingual Neuro-Cognitive Corpus
Sabrina Stehwien | Lena Henke | John Hale | Jonathan Brennan | Lars Meyer
Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources

We present the Le Petit Prince Corpus (LPPC), a multi-lingual resource for research in (computational) psycho- and neurolinguistics. The corpus consists of the children’s story The Little Prince in 26 languages. The dataset is in the process of being built using state-of-the-art methods for speech and language processing and electroencephalography (EEG). The planned release of LPPC dataset will include raw text annotated with dependency graphs in the Universal Dependencies standard, a near-natural-sounding synthetic spoken subset as well as EEG recordings. We will use this corpus for conducting neurolinguistic studies that generalize across a wide range of languages, overcoming typological constraints to traditional approaches. The planned release of the LPPC combines linguistic and EEG data for many languages using fully automatic methods, and thus constitutes a readily extendable resource that supports cross-linguistic and cross-disciplinary research.

2018

pdf bib
German Radio Interviews: The GRAIN Release of the SFB732 Silver Standard Collection
Katrin Schweitzer | Kerstin Eckart | Markus Gärtner | Agnieszka Falenska | Arndt Riester | Ina Rösiger | Antje Schweitzer | Sabrina Stehwien | Jonas Kuhn
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Improving coreference resolution with automatically predicted prosodic information
Ina Roesiger | Sabrina Stehwien | Arndt Riester | Ngoc Thang Vu
Proceedings of the Workshop on Speech-Centric Natural Language Processing

Adding manually annotated prosodic information, specifically pitch accents and phrasing, to the typical text-based feature set for coreference resolution has previously been shown to have a positive effect on German data. Practical applications on spoken language, however, would rely on automatically predicted prosodic information. In this paper we predict pitch accents (and phrase boundaries) using a convolutional neural network (CNN) model from acoustic features extracted from the speech signal. After an assessment of the quality of these automatic prosodic annotations, we show that they also significantly improve coreference resolution.