Alessia Battisti
2026
DETECT: Determining Ease and Textual Clarity of German Text Simplifications
Maria Korobeynikova | Alessia Battisti | Lukas Fischer | Yingqiang Gao
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Maria Korobeynikova | Alessia Battisti | Lukas Fischer | Yingqiang Gao
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Current evaluation of German automatic text simplification (ATS) relies on general-purpose metrics such as SARI, BLEU, and BERTScore, which insufficiently capture simplification quality in terms of simplicity, meaning preservation, and fluency. While specialized metrics like LENS have been developed for English, corresponding efforts for German have lagged behind due to the absence of human-annotated corpora. To close this gap, we introduce DETECT, the first German-specific metric that holistically evaluates ATS quality across all three dimensions of simplicity, meaning preservation, and fluency, and is trained entirely on synthetic large language model (LLM) responses. Our approach adapts the LENS framework to German and extends it with (i) a pipeline for generating synthetic quality scores via LLMs, enabling dataset creation without human annotation, and (ii) an LLM-based refinement step for aligning grading criteria with simplification requirements. To the best of our knowledge, we also construct the largest German human evaluation dataset for text simplification to validate our metric directly. Experimental results show that DETECT achieves substantially higher correlations with human judgments than widely used ATS metrics, with particularly strong gains in meaning preservation and fluency. Beyond ATS, our findings highlight both the potential and the limitations of LLMs for automatic evaluation and provide transferable guidelines for general language accessibility tasks.
2025
ConLoan: A Contrastive Multilingual Dataset for Evaluating Loanwords
Sina Ahmadi | Micha David Hess | Elena Álvarez-Mellado | Alessia Battisti | Cui Ding | Anne Göhring | Yingqiang Gao | Zifan Jiang | Andrianos Michail | Peshmerge Morad | Joel Niklaus | Maria Christina Panagiotopoulou | Stefano Perrella | Juri Opitz | Anastassia Shaitarova | Rico Sennrich
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sina Ahmadi | Micha David Hess | Elena Álvarez-Mellado | Alessia Battisti | Cui Ding | Anne Göhring | Yingqiang Gao | Zifan Jiang | Andrianos Michail | Peshmerge Morad | Joel Niklaus | Maria Christina Panagiotopoulou | Stefano Perrella | Juri Opitz | Anastassia Shaitarova | Rico Sennrich
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lexical borrowing, the adoption of words from one language into another, is a ubiquitous linguistic phenomenon influenced by geopolitical, societal, and technological factors. This paper introduces ConLoan–a novel contrastive dataset comprising sentences with and without loanwords across 10 languages. Through systematic evaluation using this dataset, we investigate how state-of-the-art machine translation and language models process loanwords compared to their native alternatives. Our experiments reveal that these systems show systematic preferences for loanwords over native terms and exhibit varying performance across languages. These findings provide valuable insights for developing more linguistically robust NLP systems.
2024
Person Identification from Pose Estimates in Sign Language
Alessia Battisti | Emma van den Bold | Anne Göhring | Franz Holzknecht | Sarah Ebling
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources
Alessia Battisti | Emma van den Bold | Anne Göhring | Franz Holzknecht | Sarah Ebling
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources
Advancing Annotation for Continuous Data in Swiss German Sign Language
Alessia Battisti | Katja Tissi | Sandra Sidler-Miserez | Sarah Ebling
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources
Alessia Battisti | Katja Tissi | Sandra Sidler-Miserez | Sarah Ebling
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources
Automatic Annotation Elaboration as Feedback to Sign Language Learners
Alessia Battisti | Sarah Ebling
Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII)
Alessia Battisti | Sarah Ebling
Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII)
Beyond enabling linguistic analyses, linguistic annotations may serve as training material for developing automatic language assessment models as well as for providing textual feedback to language learners. Yet these linguistic annotations in their original form are often not easily comprehensible for learners. In this paper, we explore the utilization of GPT-4, as an example of a large language model (LLM), to process linguistic annotations into clear and understandable feedback on their productions for language learners, specifically sign language learners.
2023
First WMT Shared Task on Sign Language Translation (WMT-SLT22)
Mathias Müller | Sarah Ebling | Eleftherios Avramidis | Alessia Battisti | Michèle Berger | Richard Bowden | Annelies Braffort | Necati Cihan Camgoz | Cristina España-Bonet | Roman Grundkiewicz | Zifan Jiang | Oscar Koller | Amit Moryossef | Regula Perrollaz | Sabine Reinhard | Annette Rios Gonzales | Dimitar Shterionov | Sandra Sidler-Miserez | Katja Tissi | Davy Van Landuyt
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Mathias Müller | Sarah Ebling | Eleftherios Avramidis | Alessia Battisti | Michèle Berger | Richard Bowden | Annelies Braffort | Necati Cihan Camgoz | Cristina España-Bonet | Roman Grundkiewicz | Zifan Jiang | Oscar Koller | Amit Moryossef | Regula Perrollaz | Sabine Reinhard | Annette Rios Gonzales | Dimitar Shterionov | Sandra Sidler-Miserez | Katja Tissi | Davy Van Landuyt
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
This paper is a brief summary of the First WMT Shared Task on Sign Language Translation (WMT-SLT22), a project partly funded by EAMT. The focus of this shared task is automatic translation between signed and spoken languages. Details can be found on our website (https://www.wmt-slt.com/) or in the findings paper (Müller et al., 2022).
2022
Findings of the First WMT Shared Task on Sign Language Translation (WMT-SLT22)
Mathias Müller | Sarah Ebling | Eleftherios Avramidis | Alessia Battisti | Michèle Berger | Richard Bowden | Annelies Braffort | Necati Cihan Camgöz | Cristina España-bonet | Roman Grundkiewicz | Zifan Jiang | Oscar Koller | Amit Moryossef | Regula Perrollaz | Sabine Reinhard | Annette Rios | Dimitar Shterionov | Sandra Sidler-miserez | Katja Tissi
Proceedings of the Seventh Conference on Machine Translation (WMT)
Mathias Müller | Sarah Ebling | Eleftherios Avramidis | Alessia Battisti | Michèle Berger | Richard Bowden | Annelies Braffort | Necati Cihan Camgöz | Cristina España-bonet | Roman Grundkiewicz | Zifan Jiang | Oscar Koller | Amit Moryossef | Regula Perrollaz | Sabine Reinhard | Annette Rios | Dimitar Shterionov | Sandra Sidler-miserez | Katja Tissi
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper presents the results of the First WMT Shared Task on Sign Language Translation (WMT-SLT22).This shared task is concerned with automatic translation between signed and spoken languages. The task is novel in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation (MT).The task featured two tracks, translating from Swiss German Sign Language (DSGS) to German and vice versa. Seven teams participated in this first edition of the task, all submitting to the DSGS-to-German track. Besides a system ranking and system papers describing state-of-the-art techniques, this shared task makes the following scientific contributions: novel corpora, reproducible baseline systems and new protocols and software for human evaluation. Finally, the task also resulted in the first publicly available set of system outputs and human evaluation scores for sign language translation.
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Julia Kreutzer | Isaac Caswell | Lisa Wang | Ahsan Wahab | Daan van Esch | Nasanbayar Ulzii-Orshikh | Allahsera Tapo | Nishant Subramani | Artem Sokolov | Claytone Sikasote | Monang Setyawan | Supheakmungkol Sarin | Sokhar Samb | Benoît Sagot | Clara Rivera | Annette Rios | Isabel Papadimitriou | Salomey Osei | Pedro Ortiz Suarez | Iroro Orife | Kelechi Ogueji | Andre Niyongabo Rubungo | Toan Q. Nguyen | Mathias Müller | André Müller | Shamsuddeen Hassan Muhammad | Nanda Muhammad | Ayanda Mnyakeni | Jamshidbek Mirzakhalov | Tapiwanashe Matangira | Colin Leong | Nze Lawson | Sneha Kudugunta | Yacine Jernite | Mathias Jenny | Orhan Firat | Bonaventure F. P. Dossou | Sakhile Dlamini | Nisansa de Silva | Sakine Çabuk Ballı | Stella Biderman | Alessia Battisti | Ahmed Baruwa | Ankur Bapna | Pallavi Baljekar | Israel Abebe Azime | Ayodele Awokoya | Duygu Ataman | Orevaoghene Ahia | Oghenefego Ahia | Sweta Agrawal | Mofetoluwa Adeyemi
Transactions of the Association for Computational Linguistics, Volume 10
Julia Kreutzer | Isaac Caswell | Lisa Wang | Ahsan Wahab | Daan van Esch | Nasanbayar Ulzii-Orshikh | Allahsera Tapo | Nishant Subramani | Artem Sokolov | Claytone Sikasote | Monang Setyawan | Supheakmungkol Sarin | Sokhar Samb | Benoît Sagot | Clara Rivera | Annette Rios | Isabel Papadimitriou | Salomey Osei | Pedro Ortiz Suarez | Iroro Orife | Kelechi Ogueji | Andre Niyongabo Rubungo | Toan Q. Nguyen | Mathias Müller | André Müller | Shamsuddeen Hassan Muhammad | Nanda Muhammad | Ayanda Mnyakeni | Jamshidbek Mirzakhalov | Tapiwanashe Matangira | Colin Leong | Nze Lawson | Sneha Kudugunta | Yacine Jernite | Mathias Jenny | Orhan Firat | Bonaventure F. P. Dossou | Sakhile Dlamini | Nisansa de Silva | Sakine Çabuk Ballı | Stella Biderman | Alessia Battisti | Ahmed Baruwa | Ankur Bapna | Pallavi Baljekar | Israel Abebe Azime | Ayodele Awokoya | Duygu Ataman | Orevaoghene Ahia | Oghenefego Ahia | Sweta Agrawal | Mofetoluwa Adeyemi
Transactions of the Association for Computational Linguistics, Volume 10
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, Web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases.
2020
A Corpus for Automatic Readability Assessment and Text Simplification of German
Alessia Battisti | Dominik Pfütze | Andreas Säuberli | Marek Kostrzewa | Sarah Ebling
Proceedings of the Twelfth Language Resources and Evaluation Conference
Alessia Battisti | Dominik Pfütze | Andreas Säuberli | Marek Kostrzewa | Sarah Ebling
Proceedings of the Twelfth Language Resources and Evaluation Conference
In this paper, we present a corpus for use in automatic readability assessment and automatic text simplification for German, the first of its kind for this language. The corpus is compiled from web sources and consists of parallel as well as monolingual-only (simplified German) data amounting to approximately 6,200 documents (nearly 211,000 sentences). As a unique feature, the corpus contains information on text structure (e.g., paragraphs, lines), typography (e.g., font type, font style), and images (content, position, and dimensions). While the importance of considering such information in machine learning tasks involving simplified language, such as readability assessment, has repeatedly been stressed in the literature, we provide empirical evidence for its benefit. We also demonstrate the added value of leveraging monolingual-only data for automatic text simplification via machine translation through applying back-translation, a data augmentation technique.
2019
Search
Fix author
Co-authors
- Sarah Ebling 7
- Zifan Jiang 3
- Mathias Müller 3
- Annette Rios Gonzales 3
- Sandra Sidler-Miserez 3
- Katja Tissi 3
- Eleftherios Avramidis 2
- Michèle Berger 2
- Richard Bowden 2
- Annelies Braffort 2
- Necati Cihan Camgöz 2
- Cristina España-Bonet 2
- Yingqiang Gao 2
- Roman Grundkiewicz 2
- Anne Göhring 2
- Oscar Koller 2
- Amit Moryossef 2
- Regula Perrollaz 2
- Sabine Reinhard 2
- Dimitar Shterionov 2
- Mofetoluwa Adeyemi 1
- Sweta Agrawal 1
- Orevaoghene Ahia 1
- Oghenefego Ahia 1
- Sina Ahmadi 1
- Duygu Ataman 1
- Ayodele Awokoya 1
- Israel Abebe Azime 1
- Pallavi Baljekar 1
- Ankur Bapna 1
- Ahmed Baruwa 1
- Stella Biderman 1
- Isaac Caswell 1
- Nisansa De Silva 1
- Cui Ding 1
- Sakhile Dlamini 1
- Bonaventure F. P. Dossou 1
- Orhan Firat 1
- Lukas Fischer 1
- Micha David Hess 1
- Franz Holzknecht 1
- Mathias Jenny 1
- Yacine Jernite 1
- Maria Korobeynikova 1
- Marek Kostrzewa 1
- Julia Kreutzer 1
- Sneha Kudugunta 1
- Davy Van Landuyt 1
- Nze Lawson 1
- Colin Leong 1
- Tapiwanashe Matangira 1
- Andrianos Michail 1
- Jamshidbek Mirzakhalov 1
- Ayanda Mnyakeni 1
- Peshmerge Morad 1
- Shamsuddeen Hassan Muhammad 1
- Nanda Muhammad 1
- André Müller 1
- Toan Q. Nguyen 1
- Joel Niklaus 1
- Kelechi Ogueji 1
- Juri Opitz 1
- Iroro Orife 1
- Pedro Ortiz Suarez 1
- Salomey Osei 1
- Maria Christina Panagiotopoulou 1
- Isabel Papadimitriou 1
- Stefano Perrella 1
- Dominik Pfütze 1
- Clara Rivera 1
- Andre Niyongabo Rubungo 1
- Benoît Sagot 1
- Sokhar Samb 1
- Supheakmungkol Sarin 1
- Rico Sennrich 1
- Monang Setyawan 1
- Anastassia Shaitarova 1
- Claytone Sikasote 1
- Artem Sokolov 1
- Nishant Subramani 1
- Andreas Säuberli 1
- Allahsera Tapo 1
- Nasanbayar Ulzii-Orshikh 1
- Martin Volk 1
- Ahsan Wahab 1
- Lisa Wang 1
- Daan van Esch 1
- Emma van den Bold 1
- Elena Álvarez-Mellado 1
- Sakine Çabuk Ballı 1