Jonathan Heitz
2025
Linguistic Features Extracted by GPT-4 Improve Alzheimer’s Disease Detection based on Spontaneous Speech
Jonathan Heitz
|
Gerold Schneider
|
Nicolas Langer
Proceedings of the 31st International Conference on Computational Linguistics
Alzheimer’s Disease (AD) is a significant and growing public health concern. Investigating alterations in speech and language patterns offers a promising path towards cost-effective and non-invasive early detection of AD on a large scale. Large language models (LLMs), such as GPT, have enabled powerful new possibilities for semantic text analysis. In this study, we leverage GPT-4 to extract five semantic features from transcripts of spontaneous patient speech. The features capture known symptoms of AD, but they are difficult to quantify effectively using traditional methods of computational linguistics. We demonstrate the clinical significance of these features and further validate one of them (“Word-Finding Difficulties”) against a proxy measure and human raters. When combined with established linguistic features and a Random Forest classifier, the GPT-derived features significantly improve the detection of AD. Our approach proves effective for both manually transcribed and automatically generated transcripts, representing a novel and impactful use of recent advancements in LLMs for AD speech analysis.
2024
The Influence of Automatic Speech Recognition on Linguistic Features and Automatic Alzheimer’s Disease Detection from Spontaneous Speech
Jonathan Heitz
|
Gerold Schneider
|
Nicolas Langer
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Alzheimer’s disease (AD) represents a major problem for society and a heavy burden for those affected. The study of changes in speech offers a potential means for large-scale AD screening that is non-invasive and inexpensive. Automatic Speech Recognition (ASR) is necessary for a fully automated system. We compare different ASR systems in terms of Word Error Rate (WER) using a publicly available benchmark dataset of speech recordings of AD patients and controls. Furthermore, this study is the first to quantify how popular linguistic features change when replacing manual transcriptions with ASR output. This contributes to the understanding of linguistic features in the context of AD detection. Moreover, we investigate how ASR affects AD classification performance by implementing two popular approaches: A fine-tuned BERT model, and Random Forest on popular linguistic features. Our results show best classification performance when using manual transcripts, but the degradation when using ASR is not dramatic. Performance stays strong, achieving an AUROC of 0.87. Our BERT-based approach is affected more strongly by ASR transcription errors than the simpler and more explainable approach based on linguistic features.