Sarah Ebling


2022

pdf bib
Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)
Sarah Ebling | Emily Prud'hommeaux | Preethi Vaidyanathan
Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)

pdf bib
A Multilingual Simplified Language News Corpus
Renate Hauser | Jannis Vamvas | Sarah Ebling | Martin Volk
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference

Simplified language news articles are being offered by specialized web portals in several countries. The thousands of articles that have been published over the years are a valuable resource for natural language processing, especially for efforts towards automatic text simplification. In this paper, we present SNIML, a large multilingual corpus of news in simplified language. The corpus contains 13k simplified news articles written in one of six languages: Finnish, French, Italian, Swedish, English, and German. All articles are shared under open licenses that permit academic use. The level of text simplification varies depending on the news portal. We believe that even though SNIML is not a parallel corpus, it can be useful as a complement to the more homogeneous but often smaller corpora of news in the simplified variety of one language that are currently in use.

2021

pdf bib
A New Dataset and Efficient Baselines for Document-level Text Simplification in German
Annette Rios | Nicolas Spring | Tannon Kew | Marek Kostrzewa | Andreas Säuberli | Mathias Müller | Sarah Ebling
Proceedings of the Third Workshop on New Frontiers in Summarization

The task of document-level text simplification is very similar to summarization with the additional difficulty of reducing complexity. We introduce a newly collected data set of German texts, collected from the Swiss news magazine 20 Minuten (‘20 Minutes’) that consists of full articles paired with simplified summaries. Furthermore, we present experiments on automatic text simplification with the pretrained multilingual mBART and a modified version thereof that is more memory-friendly, using both our new data set and existing simplification corpora. Our modifications of mBART let us train at a lower memory cost without much loss in performance, in fact, the smaller mBART even improves over the standard model in a setting with multiple simplification levels.

pdf bib
Exploring German Multi-Level Text Simplification
Nicolas Spring | Annette Rios | Sarah Ebling
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

We report on experiments in automatic text simplification (ATS) for German with multiple simplification levels along the Common European Framework of Reference for Languages (CEFR), simplifying standard German into levels A1, A2 and B1. For that purpose, we investigate the use of source labels and pretraining on standard German, allowing us to simplify standard language to a specific CEFR level. We show that these approaches are especially effective in low-resource scenarios, where we are able to outperform a standard transformer baseline. Moreover, we introduce copy labels, which we show can help the model make a distinction between sentences that require further modifications and sentences that can be copied as-is.

pdf bib
The Myth of Signing Avatars
John C. McDonald | Rosalee Wolfe | Eleni Efthimiou | Evita Fontinea | Frankie Picron | Davy Van Landuyt | Tina Sioen | Annelies Braffort | Michael Filhol | Sarah Ebling | Thomas Hanke | Verena Krausneker
Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)

Development of automatic translation between signed and spoken languages has lagged behind the development of automatic translation between spoken languages, but it is a common misperception that extending machine translation techniques to include signed languages should be a straightforward process. A contributing factor is the lack of an acceptable method for displaying sign language apart from interpreters on video. This position paper examines the challenges of displaying a signed language as a target in automatic translation, analyses the underlying causes and suggests strategies to develop display technologies that are acceptable to sign language communities.

2020

pdf bib
Benchmarking Automated Review Response Generation for the Hospitality Domain
Tannon Kew | Michael Amsler | Sarah Ebling
Proceedings of Workshop on Natural Language Processing in E-Commerce

Online customer reviews are of growing importance for many businesses in the hospitality industry, particularly restaurants and hotels. Managerial responses to such reviews provide businesses with the opportunity to influence the public discourse and to attain improved ratings over time. However, responding to each and every review is a time-consuming endeavour. Therefore, we investigate automatic generation of review responses in the hospitality domain for two languages, English and German. We apply an existing system, originally proposed for review response generation for smartphone apps. This approach employs an extended neural network sequence-to-sequence architecture and performs well in the original domain. However, as shown through our experiments, when applied to a new domain, such as hospitality, performance drops considerably. Therefore, we analyse potential causes for the differences in performance and provide evidence to suggest that review response generation in the hospitality domain is a more challenging task and thus requires further study and additional domain adaptation techniques.

pdf bib
A Corpus for Automatic Readability Assessment and Text Simplification of German
Alessia Battisti | Dominik Pfütze | Andreas Säuberli | Marek Kostrzewa | Sarah Ebling
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we present a corpus for use in automatic readability assessment and automatic text simplification for German, the first of its kind for this language. The corpus is compiled from web sources and consists of parallel as well as monolingual-only (simplified German) data amounting to approximately 6,200 documents (nearly 211,000 sentences). As a unique feature, the corpus contains information on text structure (e.g., paragraphs, lines), typography (e.g., font type, font style), and images (content, position, and dimensions). While the importance of considering such information in machine learning tasks involving simplified language, such as readability assessment, has repeatedly been stressed in the literature, we provide empirical evidence for its benefit. We also demonstrate the added value of leveraging monolingual-only data for automatic text simplification via machine translation through applying back-translation, a data augmentation technique.

pdf bib
Benchmarking Data-driven Automatic Text Simplification for German
Andreas Säuberli | Sarah Ebling | Martin Volk
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

Automatic text simplification is an active research area, and there are first systems for English, Spanish, Portuguese, and Italian. For German, no data-driven approach exists to this date, due to a lack of training data. In this paper, we present a parallel corpus of news items in German with corresponding simplifications on two complexity levels. The simplifications have been produced according to a well-documented set of guidelines. We then report on experiments in automatically simplifying the German news items using state-of-the-art neural machine translation techniques. We demonstrate that despite our small parallel corpus, our neural models were able to learn essential features of simplified language, such as lexical substitutions, deletion of less relevant words and phrases, and sentence shortening.

2018

pdf bib
SMILE Swiss German Sign Language Dataset
Sarah Ebling | Necati Cihan Camgöz | Penny Boyes Braem | Katja Tissi | Sandra Sidler-Miserez | Stephanie Stoll | Simon Hadfield | Tobias Haug | Richard Bowden | Sandrine Tornay | Marzieh Razavi | Mathew Magimai-Doss
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
An Open Web Platform for Rule-Based Speech-to-Sign Translation
Manny Rayner | Pierrette Bouillon | Sarah Ebling | Johanna Gerlach | Irene Strasly | Nikos Tsourakis
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Bridging the gap between sign language machine translation and sign language animation using sequence classification
Sarah Ebling | Matt Huenerfauth
Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies

pdf bib
Synthesizing the finger alphabet of Swiss German Sign Language and evaluating the comprehensibility of the resulting animations
Sarah Ebling | Rosalee Wolfe | Jerry Schnepp | Souad Baowidan | John McDonald | Robyn Moncrief | Sandra Sidler-Miserez | Katja Tissi
Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies

2013

pdf bib
Building a German/Simple German Parallel Corpus for Automatic Text Simplification
David Klaper | Sarah Ebling | Martin Volk
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations

2011

pdf bib
Combining Semantic and Syntactic Generalization in Example-Based Machine Translation
Sarah Ebling | Andy Way | Martin Volk | Sudip Kumar Naskar
Proceedings of the 15th Annual conference of the European Association for Machine Translation