Daniel Sonntag

2025

Advancing Biomedical Claim Verification by Using Large Language Models with Better Structured Prompting Strategies
Siting Liang | Daniel Sonntag
Proceedings of the 24th Workshop on Biomedical Language Processing

In this work, we propose a structured four-step prompting strategy that explicitly guides large language models (LLMs) through (1) claim comprehension, (2) evidence analysis, (3) intermediate conclusion, and (4) entailment decision-making to improve the accuracy of biomedical claim verification. This strategy leverages compositional and human-like reasoning to enhance logical consistency and factual grounding to reduce reliance on memorizing few-Shot exemplars and help LLMs generalize reasoning patterns across different biomedical claim verification tasks. Through extensive evaluation on biomedical NLI benchmarks, we analyze the individual contributions of each reasoning step. Our findings demonstrate that comprehension, evidence analysis, and intermediate conclusion each play distinct yet complementary roles. Systematic prompting and carefully designed step-wise instructions not only unlock the latent cognitive abilities of LLMs but also enhance interpretability by making it easier to trace errors and understand the model’s reasoning process. Our research aims to improve the reliability of AI-driven biomedical claim verification.

pdf bib abs

Human and LLM-based Assessment of Teaching Acts in Expert-led Explanatory Dialogues
Aliki Anagnostopoulou | Nils Feldhus | Yi-Sheng Hsu | Milad Alshomary | Henning Wachsmuth | Daniel Sonntag
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)

Understanding the strategies that make expert-led explanations effective is a core challenge in didactics and a key goal for explainable AI. To study this computationally, we introduce ReWIRED, a large corpus of explanatory dialogues annotated by education experts with fine-grained, span-level teaching acts across five levels of explainee knowledge. We use this resource to assess the capabilities of modern language models, finding that while few-shot LLMs struggle to label these acts, fine-tuning is a highly effective methodology. Moving beyond structural annotation, we propose and validate a suite of didactic quality metrics. We demonstrate that a prompt-based evaluation using an LLM as a “judge” is required to capture how the functional quality of an explanation aligns with the learner’s expertise – a nuance missed by simpler static metrics. Together, our dataset, modeling insights, and evaluation framework provide a comprehensive methodology to bridge pedagogical principles with computational discourse analysis.

2024

pdf bib abs

Building A German Clinical Named Entity Recognition System without In-domain Training Data
Siting Liang | Daniel Sonntag
Proceedings of the 6th Clinical Natural Language Processing Workshop

Clinical Named Entity Recognition (NER) is essential for extracting important medical insights from clinical narratives. Given the challenges in obtaining expert training datasets for real-world clinical applications related to data protection regulations and the lack of standardised entity types, this work represents a collaborative initiative aimed at building a German clinical NER system with a focus on addressing these obstacles effectively. In response to the challenge of training data scarcity, we propose a Conditional Relevance Learning (CRL) approach in low-resource transfer learning scenarios. CRL effectively leverages a pre-trained language model and domain-specific open resources, enabling the acquisition of a robust base model tailored for clinical NER tasks, particularly in the face of changing label sets. This flexibility empowers the implementation of a Multilayered Semantic Annotation (MSA) schema in our NER system, capable of organizing a diverse array of entity types, thus significantly boosting the NER system’s adaptability and utility across various clinical domains. In the case study, we demonstrate how our NER system can be applied to overcome resource constraints and comply with data privacy regulations. Lacking prior training on in-domain data, feedback from expert users in respective domains is essential in identifying areas for system refinement. Future work will focus on the integration of expert feedback to improve system performance in specific clinical contexts.

pdf bib abs

Optimizing Relation Extraction in Medical Texts through Active Learning: A Comparative Analysis of Trade-offs
Siting Liang | Pablo Valdunciel Sánchez | Daniel Sonntag
Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)

This work explores the effectiveness of employing Clinical BERT for Relation Extraction (RE) tasks in medical texts within an Active Learning (AL) framework. Our main objective is to optimize RE in medical texts through AL while examining the trade-offs between performance and computation time, comparing it with alternative methods like Random Forest and BiLSTM networks. Comparisons extend to feature engineering requirements, performance metrics, and considerations of annotation costs, including AL step times and annotation rates. The utilization of AL strategies aligns with our broader goal of enhancing the efficiency of relation classification models, particularly when dealing with the challenges of annotating complex medical texts in a Human-in-the-Loop (HITL) setting. The results indicate that uncertainty-based sampling achieves comparable performance with significantly fewer annotated samples across three categories of supervised learning methods, thereby reducing annotation costs for clinical and biomedical corpora. While Clinical BERT exhibits clear performance advantages across two different corpora, the trade-off involves longer computation times in interactive annotation processes. In real-world applications, where practical feasibility and timely results are crucial, optimizing this trade-off becomes imperative.

2023

pdf bib abs

Cross-domain German Medical Named Entity Recognition using a Pre-Trained Language Model and Unified Medical Semantic Types
Siting Liang | Mareike Hartmann | Daniel Sonntag
Proceedings of the 5th Clinical Natural Language Processing Workshop

Information extraction from clinical text has the potential to facilitate clinical research and personalized clinical care, but annotating large amounts of data for each set of target tasks is prohibitive. We present a German medical Named Entity Recognition (NER) system capable of cross-domain knowledge transferring. The system builds on a pre-trained German language model and a token-level binary classifier, employing semantic types sourced from the Unified Medical Language System (UMLS) as entity labels to identify corresponding entity spans within the input text. To enhance the system’s performance and robustness, we pre-train it using a medical literature corpus that incorporates UMLS semantic term annotations. We evaluate the system’s effectiveness on two German annotated datasets obtained from different clinics in zero- and few-shot settings. The results show that our approach outperforms task-specific Condition Random Fields (CRF) classifiers in terms of accuracy. Our work contributes to developing robust and transparent German medical NER models that can support the extraction of information from various clinical texts.

pdf bib

Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory
Aliki Anagnostopoulou | Mareike Hartmann | Daniel Sonntag
Proceedings of the Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)

2022

pdf bib abs

A survey on improving NLP models with human explanations
Mareike Hartmann | Daniel Sonntag
Proceedings of the First Workshop on Learning with Natural Language Supervision

Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data. Adding to these empirical findings, similarity with the process of human learning makes learning from explanations a promising way to establish a fruitful human-machine interaction. Several methods have been proposed for improving natural language processing (NLP) models with human explanations, that rely on different explanation types and mechanism for integrating these explanations into the learning process. These methods are rarely compared with each other, making it hard for practitioners to choose the best combination of explanation type and integration mechanism for a specific use-case. In this paper, we give an overview of different methods for learning from human explanations, and discuss different factors that can inform the decision of which method to choose for a specific use-case.

2019

pdf bib abs

Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings
Marimuthu Kalimuthu | Michael Barz | Daniel Sonntag
Proceedings of the Fourth Arabic Natural Language Processing Workshop

We study the problem of incremental domain adaptation of a generic neural machine translation model with limited resources (e.g., budget and time) for human translations or model training. In this paper, we propose a novel query strategy for selecting “unlabeled” samples from a new domain based on sentence embeddings for Arabic. We accelerate the fine-tuning process of the generic model to the target domain. Specifically, our approach estimates the informativeness of instances from the target domain by comparing the distance of their sentence embeddings to embeddings from the generic domain. We perform machine translation experiments (Ar-to-En direction) for comparing a random sampling baseline with our new approach, similar to active learning, using two small update sets for simulating the work of human translators. For the prescribed setting we can save more than 50% of the annotation costs without loss in quality, demonstrating the effectiveness of our approach.

2017

pdf bib abs

A Multimodal Dialogue System for Medical Decision Support inside Virtual Reality
Alexander Prange | Margarita Chikobava | Peter Poller | Michael Barz | Daniel Sonntag
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

We present a multimodal dialogue system that allows doctors to interact with a medical decision support system in virtual reality (VR). We integrate an interactive visualization of patient records and radiology image data, as well as therapy predictions. Therapy predictions are computed in real-time using a deep learning model.

2010

pdf bib abs

Speech Grammars for Textual Entailment Patterns in Multimodal Question Answering
Daniel Sonntag | Bogdan Sacaleanu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Over the last several years, speech-based question answering (QA) has become very popular in contrast to pure search engine based approaches on a desktop. Open-domain QA systems are now much more powerful and precise, and they can be used in speech applications. Speech-based question answering systems often rely on predefined grammars for speech understanding. In order to improve the coverage of such complex AI systems, we reused speech patterns used to generate textual entailment patterns. These can make multimodal question understanding more robust. We exemplify this in the context of a domain-specific dialogue scenario. As a result, written text input components (e.g., in a textual input field) can deal with more flexible input according to the derived textual entailment patterns. A multimodal QA dialogue spanning over several domains of interest, i.e., personal address book entries, questions about the music domain and politicians and other celebrities, demonstrates how the textual input mode can be used in a multimodal dialogue shell.

2008

pdf bib abs

Semiotic-based Ontology Evaluation Tool (S-OntoEval)
Renata Dividino | Massimo Romanelli | Daniel Sonntag
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The objective of the Semiotic-based Ontology Evaluation Tool (S-OntoEval) is to evaluate and propose improvements to a given ontological model. The evaluation aims at assessing the quality of the ontology by drawing upon semiotic theory, taking several metrics into consideration for assessing the syntactic, semantic, and pragmatic aspects of ontology quality. We consider an ontology to be a semiotic object and we identify three main types of semiotic ontology evaluation levels: the structural level, assessing the ontology syntax and formal semantics; the functional level, assessing the ontology cognitive semantics and; the usability-related level, assessing the ontology pragmatics. The Ontology Evaluation Tool implements metrics for each semiotic ontology level: on the structural level by making use of reasoner such as the RACER System and Pellet to check the logical consistency of our ontological model (TBoxes and ABoxes) and graph-theory measures such as Depth; on the functional level by making use of a task-based evaluation approach which measures the quality of the ontology based on the adequacy of the ontological model for a specific task; and on the usability-profiling level by applying a quantitative analysis of the amount of annotation. Other metrics can be easily integrated and added to the respective evaluation level. In this work, the Ontology Evaluation Tool is used to test and evaluate the SWIntO Ontology of the SmartWeb project.

2006

pdf bib abs

A Multimodal Result Ontology for Integrated Semantic Web Dialogue Applications
Daniel Sonntag | Massimo Romanelli
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

General purpose ontologies and domain ontologies make up the infrastructure of the Semantic Web, which allow for accurate data representations with relations, and data inferences. In our approach to multimodal dialogue systems providing question answering functionality (SMARTWEB), the ontological infrastructure is essential. We aim at an integrated approach in which all knowledge-aware system modules are based on interoperating ontologiesin a common data model. The discourse ontology is meant to provide the necessary dialogue- and HCI concepts. We present the ontological syntactic structure of multimodal question answering results as partof this discourse ontology which extends the W3C EMMA annotation framework and uses MPEG-7 annotations. In addition, we describe anextension to ontological result structures where automatic and context-based sorting mechanisms can be naturally incorporated.