Gerold Schneider

2025

pdf bib abs
Linguistic Features Extracted by GPT-4 Improve Alzheimer’s Disease Detection based on Spontaneous Speech
Jonathan Heitz | Gerold Schneider | Nicolas Langer
Proceedings of the 31st International Conference on Computational Linguistics

Alzheimer’s Disease (AD) is a significant and growing public health concern. Investigating alterations in speech and language patterns offers a promising path towards cost-effective and non-invasive early detection of AD on a large scale. Large language models (LLMs), such as GPT, have enabled powerful new possibilities for semantic text analysis. In this study, we leverage GPT-4 to extract five semantic features from transcripts of spontaneous patient speech. The features capture known symptoms of AD, but they are difficult to quantify effectively using traditional methods of computational linguistics. We demonstrate the clinical significance of these features and further validate one of them (“Word-Finding Difficulties”) against a proxy measure and human raters. When combined with established linguistic features and a Random Forest classifier, the GPT-derived features significantly improve the detection of AD. Our approach proves effective for both manually transcribed and automatically generated transcripts, representing a novel and impactful use of recent advancements in LLMs for AD speech analysis.

pdf bib
Investigating Linguistic Abilities of LLMs for Native Language Identification
Ahmet Yavuz Uluslu | Gerold Schneider
Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning

2024

pdf bib abs
NeuroTrialNER: An Annotated Corpus for Neurological Diseases and Therapies in Clinical Trial Registries
Simona Emilova Doneva | Tilia Ellendorff | Beate Sick | Jean-Philippe Goldman | Amelia Elaine Cannon | Gerold Schneider | Benjamin Victor Ineichen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Extracting and aggregating information from clinical trial registries could provide invaluable insights into the drug development landscape and advance the treatment of neurologic diseases. However, achieving this at scale is hampered by the volume of available data and the lack of an annotated corpus to assist in the development of automation tools. Thus, we introduce NeuroTrialNER, a new and fully open corpus for named entity recognition (NER). It comprises 1093 clinical trial summaries sourced from ClinicalTrials.gov, annotated for neurological diseases, therapeutic interventions, and control treatments. We describe our data collection process and the corpus in detail. We demonstrate its utility for NER using large language models and achieve a close-to-human performance. By bridging the gap in data resources, we hope to foster the development of text processing tools that help researchers navigate clinical trials data more easily.

pdf bib
Native Language Identification Improves Authorship Attribution
Ahmet Yavuz Uluslu | Gerold Schneider | Can Yildizli
Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024)

pdf bib abs
The Influence of Automatic Speech Recognition on Linguistic Features and Automatic Alzheimer’s Disease Detection from Spontaneous Speech
Jonathan Heitz | Gerold Schneider | Nicolas Langer
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Alzheimer’s disease (AD) represents a major problem for society and a heavy burden for those affected. The study of changes in speech offers a potential means for large-scale AD screening that is non-invasive and inexpensive. Automatic Speech Recognition (ASR) is necessary for a fully automated system. We compare different ASR systems in terms of Word Error Rate (WER) using a publicly available benchmark dataset of speech recordings of AD patients and controls. Furthermore, this study is the first to quantify how popular linguistic features change when replacing manual transcriptions with ASR output. This contributes to the understanding of linguistic features in the context of AD detection. Moreover, we investigate how ASR affects AD classification performance by implementing two popular approaches: A fine-tuned BERT model, and Random Forest on popular linguistic features. Our results show best classification performance when using manual transcripts, but the degradation when using ASR is not dramatic. Performance stays strong, achieving an AUROC of 0.87. Our BERT-based approach is affected more strongly by ASR transcription errors than the simpler and more explainable approach based on linguistic features.

pdf bib abs
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Janis Goldzycher | Paul Röttger | Gerold Schneider
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Hate speech detection models are only as good as the data they are trained on. Datasets sourced from social media suffer from systematic gaps and biases, leading to unreliable models with simplistic decision boundaries. Adversarial datasets, collected by exploiting model weaknesses, promise to fix this problem. However, adversarial data collection can be slow and costly, and individual annotators have limited creativity. In this paper, we introduce GAHD, a new German Adversarial Hate speech Dataset comprising ca. 11k examples. During data collection, we explore new strategies for supporting annotators, to create more diverse adversarial examples more efficiently and provide a manual analysis of annotator disagreements for each strategy. Our experiments show that the resulting dataset is challenging even for state-of-the-art hate speech detection models, and that training on GAHD clearly improves model robustness. Further, we find that mixing multiple support strategies is most advantageous. We make GAHD publicly available at https://github.com/jagol/gahd.

pdf bib
Evaluating Transformers on the Ethical Question of Euthanasia
Gerold Schneider | Giovanni Spitale
Proceedings of the 9th edition of the Swiss Text Analytics Conference

2023

pdf bib abs
Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data
Janis Goldzycher | Moritz Preisig | Chantal Amrhein | Gerold Schneider
The 7th Workshop on Online Abuse and Harms (WOAH)

Most research on hate speech detection has focused on English where a sizeable amount of labeled training data is available. However, to expand hate speech detection into more languages, approaches that require minimal training data are needed. In this paper, we test whether natural language inference (NLI) models which perform well in zero- and few-shot settings can benefit hate speech detection performance in scenarios where only a limited amount of labeled data is available in the target language. Our evaluation on five languages demonstrates large performance improvements of NLI fine-tuning over direct fine-tuning in the target language. However, the effectiveness of previous work that proposed intermediate fine-tuning on English data is hard to match. Only in settings where the English training data does not match the test domain, can our customised NLI-formulation outperform intermediate fine-tuning on English. Based on our extensive experiments, we propose a set of recommendations for hate speech detection in languages where minimal labeled training data is available.

2022

pdf bib
Scaling Native Language Identification with Transformer Adapters
Ahmet Yavuz Uluslu | Gerold Schneider
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

pdf bib abs
Hypothesis Engineering for Zero-Shot Hate Speech Detection
Janis Goldzycher | Gerold Schneider
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022)

Standard approaches to hate speech detection rely on sufficient available hate speech annotations. Extending previous work that repurposes natural language inference (NLI) models for zero-shot text classification, we propose a simple approach that combines multiple hypotheses to improve English NLI-based zero-shot hate speech detection. We first conduct an error analysis for vanilla NLI-based zero-shot hate speech detection and then develop four strategies based on this analysis. The strategies use multiple hypotheses to predict various aspects of an input text and combine these predictions into a final verdict. We find that the zero-shot baseline used for the initial error analysis already outperforms commercial systems and fine-tuned BERT-based hate speech detection models on HateCheck. The combination of the proposed strategies further increases the zero-shot accuracy of 79.4% on HateCheck by 7.9 percentage points (pp), and the accuracy of 69.6% on ETHOS by 10.0pp.

2020

pdf bib abs
Using Multilingual Resources to Evaluate CEFRLex for Learner Applications
Johannes Graën | David Alfter | Gerold Schneider
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Common European Framework of Reference for Languages (CEFR) defines six levels of learner proficiency, and links them to particular communicative abilities. The CEFRLex project aims at compiling lexical resources that link single words and multi-word expressions to particular CEFR levels. The resources are thought to reflect second language learner needs as they are compiled from CEFR-graded textbooks and other learner-directed texts. In this work, we investigate the applicability of CEFRLex resources for building language learning applications. Our main concerns were that vocabulary in language learning materials might be sparse, i.e. that not all vocabulary items that belong to a particular level would also occur in materials for that level, and, on the other hand, that vocabulary items might be used on lower-level materials if required by the topic (e.g. with a simpler paraphrasing or translation). Our results indicate that the English CEFRLex resource is in accordance with external resources that we jointly employ as gold standard. Together with other values obtained from monolingual and parallel corpora, we can indicate which entries need to be adjusted to obtain values that are even more in line with this gold standard. We expect that this finding also holds for the other languages

We give an overview of our approach to the extraction of interactions between pharmacogenomic entities like drugs, genes and diseases and suggest classes of interaction types driven by data from PharmGKB and partly following the top level ontology WordNet and biomedical types from BioNLP. Our text mining approach to the extraction of interactions is based on syntactic analysis. We use syntactic analyses to explore domain events and to suggest a set of interaction labels for the pharmacogenomics domain.

2011

pdf bib
An Incremental Model for the Coreference Resolution Task of BioNLP 2011
Don Tuggener | Manfred Klenner | Gerold Schneider | Simon Clematide | Fabio Rinaldi
Proceedings of BioNLP Shared Task 2011 Workshop

2009

pdf bib
UZurich in the BioNLP 2009 Shared Task
Kaarel Kaljurand | Gerold Schneider | Fabio Rinaldi
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

2008

pdf bib abs
Dependency-Based Relation Mining for Biomedical Literature
Fabio Rinaldi | Gerold Schneider | Kaarel Kaljurand | Michael Hess
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe techniques for the automatic detection of relationships among domain entities (e.g. genes, proteins, diseases) mentioned in the biomedical literature. Our approach is based on the adaptive selection of candidate interactions sentences, which are then parsed using our own dependency parser. Specific syntax-based filters are used to limit the number of possible candidate interacting pairs. The approach has been implemented as a demonstrator over a corpus of 2000 richly annotated MedLine abstracts, and later tested by participation to a text mining competition. In both cases, the results obtained have proved the adequacy of the proposed approach to the task of interaction detection.