Roland Roller - ACL Anthology

Roland Roller

2026

Infherno: End-to-end Agent-based FHIR Resource Synthesis from Free-form Clinical Notes
Johann Frei | Nils Feldhus | Lisa Raithel | Roland Roller | Alexander Meyer | Frank Kramer
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

For clinical data integration and healthcare services, the HL7 FHIR standard has established itself as a desirable format for interoperability between complex health data. Previous attempts at automating the translation from free-form clinical notes into structured FHIR resources address narrowly defined tasks and rely on modular approaches or LLMs with instruction tuning and constrained decoding. As those solutions frequently suffer from limited generalizability and structural inconformity, we propose an end-to-end framework powered by LLM agents, code execution, and healthcare terminology database tools to address these issues. Our solution, called Infherno, is designed to adhere to the FHIR document schema and competes well with a human baseline in predicting FHIR resources from unstructured text. The implementation features a front end for custom and synthetic data and both local and proprietary models, supporting clinical data integration processes and interoperability across institutions. Gemini 2.5-Pro excels in our evaluation on synthetic and clinical datasets, yet ambiguity and feasibility of collecting ground-truth data remain open problems.

2025

One Size Fits None: Rethinking Fairness in Medical AI
Roland Roller | Michael Hahn | Ajay Madhavan Ravichandran | Bilgin Osmanodja | Florian Oetke | Zeineb Sassi | Aljoscha Burchardt | Klaus Netter | Klemens Budde | Anne Herrmann | Tobias Strapatsas | Peter Dabrock | Sebastian Möller
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Machine learning (ML) models are increasingly used to support clinical decision-making. However, real-world medical datasets are often noisy, incomplete, and imbalanced, leading to performance disparities across patient subgroups. These differences raise fairness concerns, particularly when they reinforce existing disadvantages for marginalized groups. In this work, we analyze several medical prediction tasks and demonstrate how model performance varies with patient characteristics. While ML models may demonstrate good overall performance, we argue that subgroup-level evaluation is essential before integrating them into clinical workflows. By conducting a performance analysis at the subgroup level, differences can be clearly identified—allowing, on the one hand, for performance disparities to be considered in clinical practice, and on the other hand, for these insights to inform the responsible development of more effective models. Thereby, our work contributes to a practical discussion around the subgroup-sensitive development and deployment of medical ML models and the interconnectedness of fairness and transparency.

EdTec-ItemGen: Enhancing Retrieval-Augmented Item Generation Through Key Point Extraction
Alonso Palomino | David Buschhüter | Roland Roller | Niels Pinkwart | Benjamin Paassen
Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM)

A major bottleneck in exam construction involves designing test items (i.e., questions) that accurately reflect key content from domain-aligned curricular materials. For instance, during formative assessments in vocational education and training (VET), exam designers must generate updated test items that assess student learning progress while covering the full breadth of topics in the curriculum. Large language models (LLMs) can partially support this process, but effective use requires careful prompting and task-specific understanding. We propose a new key point extraction method for retrieval-augmented item generation that enhances the process of generating test items with LLMs. We exhaustively evaluated our method using a TREC-RAG approach, finding that prompting LLMs with key content rather than directly using full curricular text passages significantly improves item quality regarding key information coverage by 8%. To demonstrate these findings, we release EdTec-ItemGen, a retrieval-augmented item generation demo tool to support item generation in education.

Mitigating Bias in Item Retrieval for Enhancing Exam Assembly in Vocational Education Services
Alonso Palomino | Andreas Fischer | David Buschhüter | Roland Roller | Niels Pinkwart | Benjamin Paassen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

In education, high-quality exams must cover broad specifications across diverse difficulty levels during the assembly and calibration of test items to effectively measure examinees’ competence. However, balancing the trade-off of selecting relevant test items while fulfilling exam specifications without bias is challenging, particularly when manual item selection and exam assembly rely on a pre-validated item base. To address this limitation, we propose a new mixed-integer programming re-ranking approach to improve relevance, while mitigating bias on an industry-grade exam assembly platform. We evaluate our approach by comparing it against nine bias mitigation re-ranking methods in 225 experiments on a real-world benchmark data set from vocational education services. Experimental results demonstrate a 17% relevance improvement with a 9% bias reduction when integrating sequential optimization techniques with improved contextual relevance augmentation and scoring using a large language model. Our approach bridges information retrieval and exam assembly, enhancing the human-in-the-loop exam assembly process while promoting unbiased exam design

Beyond De-Identification: A Structured Approach for Defining and Detecting Indirect Identifiers in Medical Texts
Ibrahim Baroud | Lisa Raithel | Sebastian Möller | Roland Roller
Proceedings of the Sixth Workshop on Privacy in Natural Language Processing

Sharing sensitive texts for scientific purposes requires appropriate techniques to protect the privacy of patients and healthcare personnel. Anonymizing textual data is particularly challenging due to the presence of diverse unstructured direct and indirect identifiers. To mitigate the risk of re-identification, this work introduces a schema of nine categories of indirect identifiers designed to account for different potential adversaries, including acquaintances, family members and medical staff. Using this schema, we annotate 100 MIMIC-III discharge summaries and propose baseline models for identifying indirect identifiers. We will release the annotation guidelines, annotation spans (6,199 annotations in total) and the corresponding MIMIC-III document IDs to support further research in this area.

2024

XAI for Better Exploitation of Text in Medical Decision Support
Ajay Madhavan Ravichandran | Julianna Grune | Nils Feldhus | Aljoscha Burchardt | Roland Roller | Sebastian Möller
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

In electronic health records, text data is considered a valuable resource as it complements a medical history and may contain information that cannot be easily included in tables. But why does the inclusion of clinical texts as additional input into multimodal models, not always significantly improve the performance of medical decision-support systems? Explainable AI (XAI) might provide the answer. We examine which information in text and structured data influences the performance of models in the context of multimodal decision support for biomedical tasks. Using data from an intensive care unit and targeting a mortality prediction task, we compare information that has been considered relevant by XAI methods to the opinion of a physician.

Towards ML-supported Triage Prediction in Real-World Emergency Room Scenarios
Faraz Maschhur | Klaus Netter | Sven Schmeier | Katrin Ostermann | Rimantas Palunis | Tobias Strapatsas | Roland Roller
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

In emergency wards, patients are prioritized by clinical staff according to the urgency of their medical condition. This can be achieved by categorizing patients into different labels of urgency ranging from immediate to not urgent. However, in order to train machine learning models offering support in this regard, there is more than approaching this as a multi-class problem. This work explores the challenges and obstacles of automatic triage using anonymized real-world multi-modal ambulance data in Germany.

Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain
Tomohiro Nishiyama | Lisa Raithel | Roland Roller | Pierre Zweigenbaum | Eiji Aramaki
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)

Since medical text cannot be shared easily due to privacy concerns, synthetic data bears much potential for natural language processing applications. In the context of social media and user-generated messages about drug intake and adverse drug effects, this work presents different methods to examine the authenticity of synthetic text. We conclude that the generated tweets are untraceable and show enough authenticity from the medical point of view to be used as a replacement for a real Twitter corpus. However, original data might still be the preferred choice as they contain much more diversity.

Dynamic Prompting: Large Language Models for Task Oriented Dialog
Jan Nehring | Akhil Juneja | Adnan Ahmad | Roland Roller | Dietrich Klakow
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Large Language Models show impressive results in many different applications, most notably in the context of question-answering and open dialog situations. However, it is still an open question how to use those models for task-oriented dialogs such as booking or customer information systems, and such. In this work, we propose Dynamic Prompting, an architecture for task-oriented dialog, integrating the benefits of Large Language Models and showcasing the approach on the MultiWOZ 2.2 dataset. Our architecture leads to a high task success rate, provides sensible and specific answers, and is resistant to hallucinations. Further, we show that Dynamic Prompting is able to answer questions that were not anticipated by the dialog systems designer and that it can correct several types of errors and other characteristics of the system.

Retrieval-Augmented Knowledge Integration into Language Models: A Survey
Yuxuan Chen | Daniel Röder | Justus-Jonas Erker | Leonhard Hennig | Philippe Thomas | Sebastian Möller | Roland Roller
Proceedings of the 1st Workshop on Towards Knowledgeable Language Models (KnowLLM 2024)

This survey analyses how external knowledge can be integrated into language models in the context of retrieval-augmentation.The main goal of this work is to give an overview of: (1) Which external knowledge can be augmented? (2) Given a knowledge source, how to retrieve from it and then integrate the retrieved knowledge? To achieve this, we define and give a mathematical formulation of retrieval-augmented knowledge integration (RAKI). We discuss retrieval and integration techniques separately in detail, for each of the following knowledge formats: knowledge graph, tabular and natural language.

A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages
Lisa Raithel | Hui-Syuan Yeh | Shuntaro Yada | Cyril Grouin | Thomas Lavergne | Aurélie Névéol | Patrick Paroubek | Philippe Thomas | Tomohiro Nishiyama | Sebastian Möller | Eiji Aramaki | Yuji Matsumoto | Roland Roller | Pierre Zweigenbaum
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world. However, the existing clinical corpora predominantly revolve around scientific articles in English. This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese. Our corpus contains annotations covering 12 entity types, four attribute types, and 13 relation types. It contributes to the development of real-world multilingual language models for healthcare. We provide statistics to highlight certain challenges associated with the corpus and conduct preliminary experiments resulting in strong baselines for extracting entities and relations between these entities, both within and across languages.

Overview of #SMM4H 2024 – Task 2: Cross-Lingual Few-Shot Relation Extraction for Pharmacovigilance in French, German, and Japanese
Lisa Raithel | Philippe Thomas | Bhuvanesh Verma | Roland Roller | Hui-Syuan Yeh | Shuntaro Yada | Cyril Grouin | Shoko Wakamiya | Eiji Aramaki | Sebastian Möller | Pierre Zweigenbaum
Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

This paper provides an overview of Task 2 from the Social Media Mining for Health 2024 shared task (#SMM4H 2024), which focused on Named Entity Recognition (NER, Subtask 2a) and the joint task of NER and Relation Extraction (RE, Subtask 2b) for detecting adverse drug reactions (ADRs) in German, Japanese, and French texts written by patients. Participants were challenged with a few-shot learning scenario, necessitating models that can effectively generalize from limited annotated examples. Despite the diverse strategies employed by the participants, the overall performance across submissions from three teams highlighted significant challenges. The results underscored the complexity of extracting entities and relations in multi-lingual contexts, especially from the noisy and informal nature of user-generated content. Further research is required to develop robust systems capable of accurately identifying and associating ADR-related information in low-resource and multilingual settings.

For the past nine years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in publicly available user-generated content. This year, #SMM4H included seven shared tasks in English, Japanese, German, French, and Spanish from Twitter, Reddit, and health forums. A total of 84 teams from 22 countries registered for #SMM4H, and 45 teams participated in at least one task. This represents a growth of 180% and 160% in registration and participation, respectively, compared to the last iteration. This paper provides an overview of the tasks and participating systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.

Findings of the WMT 2024 Biomedical Translation Shared Task: Test Sets on Abstract Level
Mariana Neves | Cristian Grozea | Philippe Thomas | Roland Roller | Rachel Bawden | Aurélie Névéol | Steffen Castle | Vanessa Bonato | Giorgio Maria Di Nunzio | Federica Vezzani | Maika Vicente Navarro | Lana Yeganova | Antonio Jimeno Yepes
Proceedings of the Ninth Conference on Machine Translation

We present the results of the ninth edition of the Biomedical Translation Task at WMT’24. We released test sets for six language pairs, namely, French, German, Italian, Portuguese, Russian, and Spanish, from and into English. Eachtest set consists of 50 abstracts from PubMed. Differently from previous years, we did not split abstracts into sentences. We received submissions from five teams, and for almost all language directions. We used a baseline/comparison system based on Llama 3.1 and share the source code at https://github.com/cgrozea/wmt24biomed-ref.

2023

Clinical Text Anonymization, its Influence on Downstream NLP Tasks and the Risk of Re-Identification
Iyadh Ben Cheikh Larbi | Aljoscha Burchardt | Roland Roller
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

While text-based medical applications have become increasingly prominent, access to clinicaldata remains a major concern. To resolve this issue, further de-identification and anonymization of the data are required. This might, however, alter the contextual information within the clinical texts and therefore influence the learning and performance of possible language models. This paper systematically analyses the potential effects of various anonymization techniques on the performance of state-of-the-art machine learning models based on several datasets corresponding to five different NLP tasks. On this basis, we derive insightful findings and recommendations concerning text anonymization with regard to the performance of machine learning models. In addition, we present a simple re-identification attack applied to the anonymized text data, which can break the anonymization.

Findings of the WMT 2023 Biomedical Translation Shared Task: Evaluation of ChatGPT 3.5 as a Comparison System
Mariana Neves | Antonio Jimeno Yepes | Aurélie Névéol | Rachel Bawden | Giorgio Maria Di Nunzio | Roland Roller | Philippe Thomas | Federica Vezzani | Maika Vicente Navarro | Lana Yeganova | Dina Wiemann | Cristian Grozea
Proceedings of the Eighth Conference on Machine Translation

We present an overview of the Biomedical Translation Task that was part of the Eighth Conference on Machine Translation (WMT23). The aim of the task was the automatic translation of biomedical abstracts from the PubMed database. It included twelve language directions, namely, French, Spanish, Portuguese, Italian, German, and Russian, from and into English. We received submissions from 18 systems and for all the test sets that we released. Our comparison system was based on ChatGPT 3.5 and performed very well in comparison to many of the submissions.

2022

Subjective Text Complexity Assessment for German
Laura Seiffe | Fares Kallel | Sebastian Möller | Babak Naderi | Roland Roller
Proceedings of the Thirteenth Language Resources and Evaluation Conference

For different reasons, text can be difficult to read and understand for many people, especially if the text’s language is too complex. In order to provide suitable text for the target audience, it is necessary to measure its complexity. In this paper we describe subjective experiments to assess the readability of German text. We compile a new corpus of sentences provided by a German IT service provider. The sentences are annotated with the subjective complexity ratings by two groups of participants, namely experts and non-experts for that text domain. We then extract an extensive set of linguistically motivated features that are supposedly interacting with complexity perception. We show that a linear regression model with a subset of these features can be a very good predictor of text complexity.

An Annotated Corpus of Textual Explanations for Clinical Decision Support
Roland Roller | Aljoscha Burchardt | Nils Feldhus | Laura Seiffe | Klemens Budde | Simon Ronicke | Bilgin Osmanodja
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In recent years, machine learning for clinical decision support has gained more and more attention. In order to introduce such applications into clinical practice, a good performance might be essential, however, the aspect of trust should not be underestimated. For the treating physician using such a system and being (legally) responsible for the decision made, it is particularly important to understand the system’s recommendation. To provide insights into a model’s decision, various techniques from the field of explainability (XAI) have been proposed whose output is often enough not targeted to the domain experts that want to use the model. To close this gap, in this work, we explore how explanations could possibly look like in future. To this end, this work presents a dataset of textual explanations in context of decision support. Within a reader study, human physicians estimated the likelihood of possible negative patient outcomes in the near future and justified each decision with a few sentences. Using those sentences, we created a novel corpus, annotated with different semantic layers. Moreover, we provide an analysis of how those explanations are constructed, and how they change depending on physician, on the estimated risk and also in comparison to an automatic clinical decision support system with feature importance.

Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient’s Perspective
Lisa Raithel | Philippe Thomas | Roland Roller | Oliver Sapina | Sebastian Möller | Pierre Zweigenbaum
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content. The data consists of 4,169 binary annotated documents from a German patient forum, where users talk about health issues and get advice from medical doctors. As is common in social media data in this domain, the class labels of the corpus are very imbalanced. This and a high topic imbalance make it a very challenging dataset, since often, the same symptom can have several causes and is not always related to a medication intake. We aim to encourage further multi-lingual efforts in the domain of ADR detection and provide preliminary experiments for binary classification using different methods of zero- and few-shot learning based on a multi-lingual model. When fine-tuning XLM-RoBERTa first on English patient forum data and then on the new German data, we achieve an F1-score of 37.52 for the positive class. We make the dataset and models publicly available for the community.

In the seventh edition of the WMT Biomedical Task, we addressed a total of seven languagepairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian. This year’s test sets covered three types of biomedical text genre. In addition to scientific abstracts and terminology items used in previous editions, we released test sets of clinical cases. The evaluation of clinical cases translations were given special attention by involving clinicians in the preparation of reference translations and manual evaluation. For the main MEDLINE test sets, we received a total of 609 submissions from 37 teams. For the ClinSpEn sub-task, we had the participation of five teams.

2021

In the sixth edition of the WMT Biomedical Task, we addressed a total of eight language pairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian, and English/Basque. Further, our tests were composed of three types of textual test sets. New to this year, we released a test set of summaries of animal experiments, in addition to the test sets of scientific abstracts and terminologies. We received a total of 107 submissions from 15 teams from 6 countries.

2020

From Witch’s Shot to Music Making Bones - Resources for Medical Laymen to Technical Language and Vice Versa
Laura Seiffe | Oliver Marten | Michael Mikhailov | Sven Schmeier | Sebastian Möller | Roland Roller
Proceedings of the Twelfth Language Resources and Evaluation Conference

Many people share information in social media or forums, like food they eat, sports activities they do or events which have been visited. Information we share online unveil directly or indirectly information about our lifestyle and health situation. Particularly when text input is getting longer or multiple messages can be linked to each other. Those information can be then used to detect possible risk factors of diseases or adverse drug reactions of medications. However, as most people are not medical experts, language used might be more descriptive rather than the precise medical expression as medics do. To detect and use those relevant information, laymen language has to be translated and/or linked against the corresponding medical concept. This work presents baseline data sources in order to address this challenge for German language. We introduce a new dataset which annotates medical laymen and technical expressions in a patient forum, along with a set of medical synonyms and definitions, and present first baseline results on the data.

Machine translation of scientific abstracts and terminologies has the potential to support health professionals and biomedical researchers in some of their activities. In the fifth edition of the WMT Biomedical Task, we addressed a total of eight language pairs. Five language pairs were previously addressed in past editions of the shared task, namely, English/German, English/French, English/Spanish, English/Portuguese, and English/Chinese. Three additional languages pairs were also introduced this year: English/Russian, English/Italian, and English/Basque. The task addressed the evaluation of both scientific abstracts (all language pairs) and terminologies (English/Basque only). We received submissions from a total of 20 teams. For recurring language pairs, we observed an improvement in the translations in terms of automatic scores and qualitative evaluations, compared to previous years.

2019

Exploring Diachronic Changes of Biomedical Knowledge using Distributed Concept Representations
Gaurav Vashisth | Jan-Niklas Voigt-Antons | Michael Mikhailov | Roland Roller
Proceedings of the 18th BioNLP Workshop and Shared Task

In research best practices can change over time as new discoveries are made and novel methods are implemented. Scientific publications reporting about the latest facts and current state-of-the-art can be possibly outdated after some years or even proved to be false. A publication usually sheds light only on the knowledge of the period it has been published. Thus, the aspect of time can play an essential role in the reliability of the presented information. In Natural Language Processing many methods focus on information extraction from text, such as detecting entities and their relationship to each other. Those methods mostly focus on the facts presented in the text itself and not on the aspects of knowledge which changes over time. This work instead examines the evolution in biomedical knowledge over time using scientific literature in terms of diachronic change. Mainly the usage of temporal and distributional concept representations are explored and evaluated by a proof-of-concept.

2018

Football and Beer - a Social Media Analysis on Twitter in Context of the FIFA Football World Cup 2018
Roland Roller | Philippe Thomas | Sven Schmeier
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

In many societies alcohol is a legal and common recreational substance and socially accepted. Alcohol consumption often comes along with social events as it helps people to increase their sociability and to overcome their inhibitions. On the other hand we know that increased alcohol consumption can lead to serious health issues, such as cancer, cardiovascular diseases and diseases of the digestive system, to mention a few. This work examines alcohol consumption during the FIFA Football World Cup 2018, particularly the usage of alcohol related information on Twitter. For this we analyse the tweeting behaviour and show that the tournament strongly increases the interest in beer. Furthermore we show that countries who had to leave the tournament at early stage might have done something good to their fans as the interest in beer decreased again.

2017

Annotation of Entities and Relations in Spanish Radiology Reports
Viviana Cotik | Darío Filippo | Roland Roller | Hans Uszkoreit | Feiyu Xu
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Radiology reports express the results of a radiology study and contain information about anatomical entities, findings, measures and impressions of the medical doctor. The use of information extraction techniques can help physicians to access this information in order to understand data and to infer further knowledge. Supervised machine learning methods are very popular to address information extraction, but are usually domain and language dependent. To train new classification models, annotated data is required. Moreover, annotated data is also required as an evaluation resource of information extraction algorithms. However, one major drawback of processing clinical data is the low availability of annotated datasets. For this reason we performed a manual annotation of radiology reports written in Spanish. This paper presents the corpus, the annotation schema, the annotation guidelines and further insight of the data.

2016

A fine-grained corpus annotation schema of German nephrology records
Roland Roller | Hans Uszkoreit | Feiyu Xu | Laura Seiffe | Michael Mikhailov | Oliver Staeck | Klemens Budde | Fabian Halleck | Danilo Schmidt
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

In this work we present a fine-grained annotation schema to detect named entities in German clinical data of chronically ill patients with kidney diseases. The annotation schema is driven by the needs of our clinical partners and the linguistic aspects of German language. In order to generate annotations within a short period, the work also presents a semi-automatic annotation which uses additional sources of knowledge such as UMLS, to pre-annotate concepts in advance. The presented schema will be used to apply novel techniques from natural language processing and machine learning to support doctors treating their patients by improved information access from unstructured German texts.

Negation Detection in Clinical Reports Written in German
Viviana Cotik | Roland Roller | Feiyu Xu | Hans Uszkoreit | Klemens Budde | Danilo Schmidt
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

An important subtask in clinical text mining tries to identify whether a clinical finding is expressed as present, absent or unsure in a text. This work presents a system for detecting mentions of clinical findings that are negated or just speculated. The system has been applied to two different types of German clinical texts: clinical notes and discharge summaries. Our approach is built on top of NegEx, a well known algorithm for identifying non-factive mentions of medical findings. In this work, we adjust a previous adaptation of NegEx to German and evaluate the system on our data to detect negation and speculation. The results are compared to a baseline algorithm and are analyzed for both types of clinical documents. Our system achieves an F1-Score above 0.9 on both types of reports.

2015

Improving distant supervision using inference learning
Roland Roller | Eneko Agirre | Aitor Soroa | Mark Stevenson
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Held-out versus Gold Standard: Comparison of Evaluation Strategies for Distantly Supervised Relation Extraction from Medline abstracts
Roland Roller | Mark Stevenson
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

Making the most of limited training data using distant supervision
Roland Roller | Mark Stevenson
Proceedings of BioNLP 15

2014

Applying UMLS for Distantly Supervised Relation Detection
Roland Roller | Mark Stevenson
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

2013

Identification of Genia Events using Multiple Classifiers
Roland Roller | Mark Stevenson
Proceedings of the BioNLP Shared Task 2013 Workshop

Co-authors

Antonio Jimeno Yepes 6

Mariana Neves 6

Rachel Bawden 5

Giorgio Maria Di Nunzio 5

Mark Stevenson 5

Federica Vezzani 5

Maika Vicente Navarro 5

Lana Yeganova 5

Pierre Zweigenbaum 5

Klemens Budde 4

Aljoscha Burchardt 4

Michael Mikhailov 3

Sven Schmeier 3

Hans Uszkoreit 3

Shuntaro Yada 3

David Buschhüter 2

Viviana Cotik 2

Inigo Jauregi Unanue 2

David Martinez Iraola 2

Tomohiro Nishiyama 2

Bilgin Osmanodja 2

Benjamin Paassen 2

Alonso Palomino 2

Olatz Perez-de-Vinaspre 2

Niels Pinkwart 2

Ajay Madhavan Ravichandran 2

Danilo Schmidt 2

Tobias Strapatsas 2

Shoko Wakamiya 2

Hui-Syuan Yeh 2

Ibrahim Baroud 1

Iyadh Ben Cheikh Larbi 1

Ondřej Bojar 1

Vanessa Bonato 1

Steffen Castle 1

Peter Dabrock 1

Justus-Jonas Erker 1

Darryl Johan Estrada 1

Eulalia Farre-maduel 1

Darío Filippo 1

Andreas Fischer 1

Guillermo Garcia 1

Graciela Gonzalez 1

Julianna Grune 1

Christel Gérardin 1

Fabian Halleck 1

Leonhard Hennig 1

Sophia Hernandez 1

Anne Herrmann 1

Madeleine Kittner 1

Dietrich Klakow 1

Martin Krallinger 1

Thomas Lavergne 1

Yvonne Lichtblau 1

Salvador Lima-López 1

Oliver Marten 1

Faraz Maschhur 1

Yuji Matsumoto 1

Alexander Meyer 1

Florian Oetke 1

Katrin Ostermann 1

Karen O’Connor 1

Rimantas Palunis 1

Patrick Paroubek 1

Massimo Piccardi 1

Raul Rodriguez-Esteban 1

Simon Ronicke 1

Daniel Röder 1

Oliver Sapina 1

Vishakha Sharma 1

Oliver Staeck 1

Saskia Trescher 1

Gaurav Vashisth 1

Bhuvanesh Verma 1

Karin Verspoor 1

Jan-Niklas Voigt-Antons 1

Davy Weissenbacher 1

Venues