Wolfgang Nejdl


pdf bib
Data Drift in Clinical Outcome Prediction from Admission Notes
Paul Grundmann | Jens-Michalis Papaioannou | Tom Oberhauser | Thomas Steffek | Amy Siu | Wolfgang Nejdl | Alexander Loeser
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Clinical NLP research faces a scarcity of publicly available datasets due to privacy concerns. MIMIC-III marked a significant milestone, enabling substantial progress, and now, with MIMIC-IV, the dataset has expanded significantly, offering a broader scope. In this paper, we focus on the task of predicting clinical outcomes from clinical text. This is crucial in modern healthcare, aiding in preventive care, differential diagnosis, and capacity planning. We introduce a novel clinical outcome prediction dataset derived from MIMIC-IV. Furthermore, we provide initial insights into the performance of models trained on MIMIC-III when applied to our new dataset, with specific attention to potential data drift. We investigate challenges tied to evolving documentation standards and changing codes in the International Classification of Diseases (ICD) taxonomy, such as the transition from ICD-9 to ICD-10. We also explore variations in clinical text across different hospital wards. Our study aims to probe the robustness and generalization of clinical outcome prediction models, contributing to the ongoing advancement of clinical NLP in healthcare.

pdf bib
TIGQA: An Expert-Annotated Question-Answering Dataset in Tigrinya
Hailay Kidu Teklehaymanot | Dren Fazlija | Niloy Ganguly | Gourab Kumar Patro | Wolfgang Nejdl
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The absence of explicitly tailored, accessible annotated datasets for educational purposes presents a notable obstacle for NLP tasks in languages with limited resources. This study initially explores the feasibility of using machine translation (MT) to convert an existing dataset into a Tigrinya dataset in SQuAD format. As a result, we present TIGQA, an expert-annotated dataset containing 2,685 question-answer pairs covering 122 diverse topics such as climate, water, and traffic. These pairs are from 537 context paragraphs in publicly accessible Tigrinya and Biology books. Through comprehensive analyses, we demonstrate that the TIGQA dataset requires skills beyond simple word matching, requiring both single-sentence and multiple-sentence inference abilities. We conduct experiments using state-of-the-art MRC methods, marking the first exploration of such models on TIGQA. Additionally, we estimate human performance on the dataset and juxtapose it with the results obtained from pre-trained models. The notable disparities between human performance and the best model performance underscore the potential for fu- ture enhancements to TIGQA through continued research. Our dataset is freely accessible via the provided link to encourage the research community to address the challenges in the Tigrinya MRC. Keywords: Tigrinya QA dataset, Low resource QA dataset, domain specific QA

pdf bib
Revisiting Clinical Outcome Prediction for MIMIC-IV
Tom Röhr | Alexei Figueroa | Jens-Michalis Papaioannou | Conor Fallon | Keno Bressem | Wolfgang Nejdl | Alexander Löser
Proceedings of the 6th Clinical Natural Language Processing Workshop

Clinical Decision Support Systems assist medical professionals in providing optimal care for patients.A prominent data source used for creating tasks for such systems is the Medical Information Mart for Intensive Care (MIMIC).MIMIC contains electronic health records (EHR) gathered in a tertiary hospital in the United States.The majority of past work is based on the third version of MIMIC, although the fourth is the most recent version.This new version, not only introduces more data into MIMIC, but also increases the variety of patients.While MIMIC-III is limited to intensive care units, MIMIC-IV also offers EHRs from the emergency department.In this work, we investigate how to adapt previous work to update clinical outcome prediction for MIMIC-IV.We revisit several established tasks, including prediction of diagnoses, procedures, length-of-stay, and also introduce a novel task: patient routing prediction.Furthermore, we quantitatively and qualitatively evaluate all tasks on several bio-medical transformer encoder models.Finally, we provide narratives for future research directions in the clinical outcome prediction domain. We make our source code publicly available to reproduce our experiments, data, and tasks.


pdf bib
Toxicity, Morality, and Speech Act Guided Stance Detection
Apoorva Upadhyaya | Marco Fisichella | Wolfgang Nejdl
Findings of the Association for Computational Linguistics: EMNLP 2023

In this work, we focus on the task of determining the public attitude toward various social issues discussed on social media platforms. Platforms such as Twitter, however, are often used to spread misinformation, fake news through polarizing views. Existing literature suggests that higher levels of toxicity prevalent in Twitter conversations often spread negativity and delay addressing issues. Further, the embedded moral values and speech acts specifying the intention of the tweet correlate with public opinions expressed on various topics. However, previous works, which mainly focus on stance detection, either ignore the speech act, toxic, and moral features of these tweets that can collectively help capture public opinion or lack an efficient architecture that can detect the attitudes across targets. Therefore, in our work, we focus on the main task of stance detection by exploiting the toxicity, morality, and speech act as auxiliary tasks. We propose a multitasking model TWISTED that initially extracts the valence, arousal, and dominance aspects hidden in the tweets and injects the emotional sense into the embedded text followed by an efficient attention framework to correctly detect the tweet’s stance by using the shared features of toxicity, morality, and speech acts present in the tweet. Extensive experiments conducted on 4 benchmark stance detection datasets (SemEval-2016, P-Stance, COVID19-Stance, and ClimateChange) comprising different domains demonstrate the effectiveness and generalizability of our approach.


pdf bib
This Patient Looks Like That Patient: Prototypical Networks for Interpretable Diagnosis Prediction from Clinical Text
Betty van Aken | Jens-Michalis Papaioannou | Marcel Naik | Georgios Eleftheriadis | Wolfgang Nejdl | Felix Gers | Alexander Loeser
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The use of deep neural models for diagnosis prediction from clinical text has shown promising results. However, in clinical practice such models must not only be accurate, but provide doctors with interpretable and helpful results. We introduce ProtoPatient, a novel method based on prototypical networks and label-wise attention with both of these abilities. ProtoPatient makes predictions based on parts of the text that are similar to prototypical patients—providing justifications that doctors understand. We evaluate the model on two publicly available clinical datasets and show that it outperforms existing baselines. Quantitative and qualitative evaluations with medical doctors further demonstrate that the model provides valuable explanations for clinical decision support.


pdf bib
A Trio Neural Model for Dynamic Entity Relatedness Ranking
Tu Nguyen | Tuan Tran | Wolfgang Nejdl
Proceedings of the 22nd Conference on Computational Natural Language Learning

Measuring entity relatedness is a fundamental task for many natural language processing and information retrieval applications. Prior work often studies entity relatedness in a static setting and unsupervised manner. However, entities in real-world are often involved in many different relationships, consequently entity relations are very dynamic over time. In this work, we propose a neural network-based approach that leverages public attention as supervision. Our model is capable of learning rich and different entity representations in a joint framework. Through extensive experiments on large-scale datasets, we demonstrate that our method achieves better results than competitive baselines.


pdf bib
Cross-Corpus Textual Entailment for Sublanguage Analysis in Epidemic Intelligence
Avaré Stewart | Kerstin Denecke | Wolfgang Nejdl
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Textual entailment has been recognized as a generic task that captures major semantic inference needs across many natural language processing applications. However, to date, textual entailment has not been considered in a cross-corpus setting, nor for user generated content. Given the emergence of Medicine 2.0, medical blogs are becoming an increasingly accepted source of information. However, given the characteristics of blogs( which tend to be noisy and informal; or contain a interspersing of subjective and factual sentences) a potentially large amount of irrelevant information may be present. Given the potential noise, the overarching problem with respect to information extraction from social media is achieving the correct level of sentence filtering - as opposed to document or blog post level. Specifically for the task of medical intelligence gathering. In this paper, we propose an approach to textual entailment with uses the text from one source of user generated content (T text) for sentence-level filtering within a new and less amenable one (H text), when the underlying domain, tasks or semantic information is the same, or overlaps.