Roland Roller


2023

pdf bib
Findings of the WMT 2023 Biomedical Translation Shared Task: Evaluation of ChatGPT 3.5 as a Comparison System
Mariana Neves | Antonio Jimeno Yepes | Aurélie Névéol | Rachel Bawden | Giorgio Maria Di Nunzio | Roland Roller | Philippe Thomas | Federica Vezzani | Maika Vicente Navarro | Lana Yeganova | Dina Wiemann | Cristian Grozea
Proceedings of the Eighth Conference on Machine Translation

We present an overview of the Biomedical Translation Task that was part of the Eighth Conference on Machine Translation (WMT23). The aim of the task was the automatic translation of biomedical abstracts from the PubMed database. It included twelve language directions, namely, French, Spanish, Portuguese, Italian, German, and Russian, from and into English. We received submissions from 18 systems and for all the test sets that we released. Our comparison system was based on ChatGPT 3.5 and performed very well in comparison to many of the submissions.

pdf bib
Clinical Text Anonymization, its Influence on Downstream NLP Tasks and the Risk of Re-Identification
Iyadh Ben Cheikh Larbi | Aljoscha Burchardt | Roland Roller
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

While text-based medical applications have become increasingly prominent, access to clinicaldata remains a major concern. To resolve this issue, further de-identification and anonymization of the data are required. This might, however, alter the contextual information within the clinical texts and therefore influence the learning and performance of possible language models. This paper systematically analyses the potential effects of various anonymization techniques on the performance of state-of-the-art machine learning models based on several datasets corresponding to five different NLP tasks. On this basis, we derive insightful findings and recommendations concerning text anonymization with regard to the performance of machine learning models. In addition, we present a simple re-identification attack applied to the anonymized text data, which can break the anonymization.

2022

pdf bib
Findings of the WMT 2022 Biomedical Translation Shared Task: Monolingual Clinical Case Reports
Mariana Neves | Antonio Jimeno Yepes | Amy Siu | Roland Roller | Philippe Thomas | Maika Vicente Navarro | Lana Yeganova | Dina Wiemann | Giorgio Maria Di Nunzio | Federica Vezzani | Christel Gerardin | Rachel Bawden | Darryl Johan Estrada | Salvador Lima-lopez | Eulalia Farre-maduel | Martin Krallinger | Cristian Grozea | Aurelie Neveol
Proceedings of the Seventh Conference on Machine Translation (WMT)

In the seventh edition of the WMT Biomedical Task, we addressed a total of seven languagepairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian. This year’s test sets covered three types of biomedical text genre. In addition to scientific abstracts and terminology items used in previous editions, we released test sets of clinical cases. The evaluation of clinical cases translations were given special attention by involving clinicians in the preparation of reference translations and manual evaluation. For the main MEDLINE test sets, we received a total of 609 submissions from 37 teams. For the ClinSpEn sub-task, we had the participation of five teams.

pdf bib
Subjective Text Complexity Assessment for German
Laura Seiffe | Fares Kallel | Sebastian Möller | Babak Naderi | Roland Roller
Proceedings of the Thirteenth Language Resources and Evaluation Conference

For different reasons, text can be difficult to read and understand for many people, especially if the text’s language is too complex. In order to provide suitable text for the target audience, it is necessary to measure its complexity. In this paper we describe subjective experiments to assess the readability of German text. We compile a new corpus of sentences provided by a German IT service provider. The sentences are annotated with the subjective complexity ratings by two groups of participants, namely experts and non-experts for that text domain. We then extract an extensive set of linguistically motivated features that are supposedly interacting with complexity perception. We show that a linear regression model with a subset of these features can be a very good predictor of text complexity.

pdf bib
An Annotated Corpus of Textual Explanations for Clinical Decision Support
Roland Roller | Aljoscha Burchardt | Nils Feldhus | Laura Seiffe | Klemens Budde | Simon Ronicke | Bilgin Osmanodja
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In recent years, machine learning for clinical decision support has gained more and more attention. In order to introduce such applications into clinical practice, a good performance might be essential, however, the aspect of trust should not be underestimated. For the treating physician using such a system and being (legally) responsible for the decision made, it is particularly important to understand the system’s recommendation. To provide insights into a model’s decision, various techniques from the field of explainability (XAI) have been proposed whose output is often enough not targeted to the domain experts that want to use the model. To close this gap, in this work, we explore how explanations could possibly look like in future. To this end, this work presents a dataset of textual explanations in context of decision support. Within a reader study, human physicians estimated the likelihood of possible negative patient outcomes in the near future and justified each decision with a few sentences. Using those sentences, we created a novel corpus, annotated with different semantic layers. Moreover, we provide an analysis of how those explanations are constructed, and how they change depending on physician, on the estimated risk and also in comparison to an automatic clinical decision support system with feature importance.

pdf bib
Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient’s Perspective
Lisa Raithel | Philippe Thomas | Roland Roller | Oliver Sapina | Sebastian Möller | Pierre Zweigenbaum
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content. The data consists of 4,169 binary annotated documents from a German patient forum, where users talk about health issues and get advice from medical doctors. As is common in social media data in this domain, the class labels of the corpus are very imbalanced. This and a high topic imbalance make it a very challenging dataset, since often, the same symptom can have several causes and is not always related to a medication intake. We aim to encourage further multi-lingual efforts in the domain of ADR detection and provide preliminary experiments for binary classification using different methods of zero- and few-shot learning based on a multi-lingual model. When fine-tuning XLM-RoBERTa first on English patient forum data and then on the new German data, we achieve an F1-score of 37.52 for the positive class. We make the dataset and models publicly available for the community.

2021

pdf bib
Findings of the WMT 2021 Biomedical Translation Shared Task: Summaries of Animal Experiments as New Test Set
Lana Yeganova | Dina Wiemann | Mariana Neves | Federica Vezzani | Amy Siu | Inigo Jauregi Unanue | Maite Oronoz | Nancy Mah | Aurélie Névéol | David Martinez | Rachel Bawden | Giorgio Maria Di Nunzio | Roland Roller | Philippe Thomas | Cristian Grozea | Olatz Perez-de-Viñaspre | Maika Vicente Navarro | Antonio Jimeno Yepes
Proceedings of the Sixth Conference on Machine Translation

In the sixth edition of the WMT Biomedical Task, we addressed a total of eight language pairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian, and English/Basque. Further, our tests were composed of three types of textual test sets. New to this year, we released a test set of summaries of animal experiments, in addition to the test sets of scientific abstracts and terminologies. We received a total of 107 submissions from 15 teams from 6 countries.

2020

pdf bib
From Witch’s Shot to Music Making Bones - Resources for Medical Laymen to Technical Language and Vice Versa
Laura Seiffe | Oliver Marten | Michael Mikhailov | Sven Schmeier | Sebastian Möller | Roland Roller
Proceedings of the Twelfth Language Resources and Evaluation Conference

Many people share information in social media or forums, like food they eat, sports activities they do or events which have been visited. Information we share online unveil directly or indirectly information about our lifestyle and health situation. Particularly when text input is getting longer or multiple messages can be linked to each other. Those information can be then used to detect possible risk factors of diseases or adverse drug reactions of medications. However, as most people are not medical experts, language used might be more descriptive rather than the precise medical expression as medics do. To detect and use those relevant information, laymen language has to be translated and/or linked against the corresponding medical concept. This work presents baseline data sources in order to address this challenge for German language. We introduce a new dataset which annotates medical laymen and technical expressions in a patient forum, along with a set of medical synonyms and definitions, and present first baseline results on the data.

pdf bib
Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages
Rachel Bawden | Giorgio Maria Di Nunzio | Cristian Grozea | Inigo Jauregi Unanue | Antonio Jimeno Yepes | Nancy Mah | David Martinez | Aurélie Névéol | Mariana Neves | Maite Oronoz | Olatz Perez-de-Viñaspre | Massimo Piccardi | Roland Roller | Amy Siu | Philippe Thomas | Federica Vezzani | Maika Vicente Navarro | Dina Wiemann | Lana Yeganova
Proceedings of the Fifth Conference on Machine Translation

Machine translation of scientific abstracts and terminologies has the potential to support health professionals and biomedical researchers in some of their activities. In the fifth edition of the WMT Biomedical Task, we addressed a total of eight language pairs. Five language pairs were previously addressed in past editions of the shared task, namely, English/German, English/French, English/Spanish, English/Portuguese, and English/Chinese. Three additional languages pairs were also introduced this year: English/Russian, English/Italian, and English/Basque. The task addressed the evaluation of both scientific abstracts (all language pairs) and terminologies (English/Basque only). We received submissions from a total of 20 teams. For recurring language pairs, we observed an improvement in the translations in terms of automatic scores and qualitative evaluations, compared to previous years.

2019

pdf bib
Exploring Diachronic Changes of Biomedical Knowledge using Distributed Concept Representations
Gaurav Vashisth | Jan-Niklas Voigt-Antons | Michael Mikhailov | Roland Roller
Proceedings of the 18th BioNLP Workshop and Shared Task

In research best practices can change over time as new discoveries are made and novel methods are implemented. Scientific publications reporting about the latest facts and current state-of-the-art can be possibly outdated after some years or even proved to be false. A publication usually sheds light only on the knowledge of the period it has been published. Thus, the aspect of time can play an essential role in the reliability of the presented information. In Natural Language Processing many methods focus on information extraction from text, such as detecting entities and their relationship to each other. Those methods mostly focus on the facts presented in the text itself and not on the aspects of knowledge which changes over time. This work instead examines the evolution in biomedical knowledge over time using scientific literature in terms of diachronic change. Mainly the usage of temporal and distributional concept representations are explored and evaluated by a proof-of-concept.

2018

bib
Football and Beer - a Social Media Analysis on Twitter in Context of the FIFA Football World Cup 2018
Roland Roller | Philippe Thomas | Sven Schmeier
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

In many societies alcohol is a legal and common recreational substance and socially accepted. Alcohol consumption often comes along with social events as it helps people to increase their sociability and to overcome their inhibitions. On the other hand we know that increased alcohol consumption can lead to serious health issues, such as cancer, cardiovascular diseases and diseases of the digestive system, to mention a few. This work examines alcohol consumption during the FIFA Football World Cup 2018, particularly the usage of alcohol related information on Twitter. For this we analyse the tweeting behaviour and show that the tournament strongly increases the interest in beer. Furthermore we show that countries who had to leave the tournament at early stage might have done something good to their fans as the interest in beer decreased again.

2017

pdf bib
Annotation of Entities and Relations in Spanish Radiology Reports
Viviana Cotik | Darío Filippo | Roland Roller | Hans Uszkoreit | Feiyu Xu
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Radiology reports express the results of a radiology study and contain information about anatomical entities, findings, measures and impressions of the medical doctor. The use of information extraction techniques can help physicians to access this information in order to understand data and to infer further knowledge. Supervised machine learning methods are very popular to address information extraction, but are usually domain and language dependent. To train new classification models, annotated data is required. Moreover, annotated data is also required as an evaluation resource of information extraction algorithms. However, one major drawback of processing clinical data is the low availability of annotated datasets. For this reason we performed a manual annotation of radiology reports written in Spanish. This paper presents the corpus, the annotation schema, the annotation guidelines and further insight of the data.

pdf bib
Findings of the WMT 2017 Biomedical Translation Shared Task
Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Karin Verspoor | Ondřej Bojar | Arthur Boyer | Cristian Grozea | Barry Haddow | Madeleine Kittner | Yvonne Lichtblau | Pavel Pecina | Roland Roller | Rudolf Rosa | Amy Siu | Philippe Thomas | Saskia Trescher
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
A fine-grained corpus annotation schema of German nephrology records
Roland Roller | Hans Uszkoreit | Feiyu Xu | Laura Seiffe | Michael Mikhailov | Oliver Staeck | Klemens Budde | Fabian Halleck | Danilo Schmidt
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

In this work we present a fine-grained annotation schema to detect named entities in German clinical data of chronically ill patients with kidney diseases. The annotation schema is driven by the needs of our clinical partners and the linguistic aspects of German language. In order to generate annotations within a short period, the work also presents a semi-automatic annotation which uses additional sources of knowledge such as UMLS, to pre-annotate concepts in advance. The presented schema will be used to apply novel techniques from natural language processing and machine learning to support doctors treating their patients by improved information access from unstructured German texts.

pdf bib
Negation Detection in Clinical Reports Written in German
Viviana Cotik | Roland Roller | Feiyu Xu | Hans Uszkoreit | Klemens Budde | Danilo Schmidt
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

An important subtask in clinical text mining tries to identify whether a clinical finding is expressed as present, absent or unsure in a text. This work presents a system for detecting mentions of clinical findings that are negated or just speculated. The system has been applied to two different types of German clinical texts: clinical notes and discharge summaries. Our approach is built on top of NegEx, a well known algorithm for identifying non-factive mentions of medical findings. In this work, we adjust a previous adaptation of NegEx to German and evaluate the system on our data to detect negation and speculation. The results are compared to a baseline algorithm and are analyzed for both types of clinical documents. Our system achieves an F1-Score above 0.9 on both types of reports.

2015

pdf bib
Held-out versus Gold Standard: Comparison of Evaluation Strategies for Distantly Supervised Relation Extraction from Medline abstracts
Roland Roller | Mark Stevenson
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

pdf bib
Making the most of limited training data using distant supervision
Roland Roller | Mark Stevenson
Proceedings of BioNLP 15

pdf bib
Improving distant supervision using inference learning
Roland Roller | Eneko Agirre | Aitor Soroa | Mark Stevenson
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Applying UMLS for Distantly Supervised Relation Detection
Roland Roller | Mark Stevenson
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

2013

pdf bib
Identification of Genia Events using Multiple Classifiers
Roland Roller | Mark Stevenson
Proceedings of the BioNLP Shared Task 2013 Workshop