LoSST-AD: A Longitudinal Corpus for Tracking Alzheimer’s Disease Related Changes in Spontaneous Speech

Ulla Petti, Anna Korhonen


Abstract
Language-based biomarkers have shown promising results in differentiating those with Alzheimer’s disease (AD) diagnosis from healthy individuals, but the earliest changes in language are thought to start years or even decades before the diagnosis. Detecting these changes is critical to allow early interventions, but research into the earliest signs is challenging, as it requires large longitudinal datasets that are time-consuming and expensive to collect. There is a need for alternative methods for tracking longitudinal language change, including Natural Language Processing (NLP) and speech recognition technologies. We present a novel corpus that can enable this: a corpus of transcripts of public interviews with 20 famous figures, half of whom will eventually be diagnosed with AD, recorded over several decades. We evaluate the corpus by validating patterns of vocabulary richness changes known from literature, such as decline in noun frequency, word length, and several other features. We show that public data could be used to collect longitudinal datasets without causing extra stress for the participant, and that these data can adequately reflect longitudinal AD-related changes in vocabulary richness. Our corpus can provide a valuable starting point for the development of early detection tools and enhance our understanding of how AD affects language over time.
Anthology ID:
2024.lrec-main.944
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
10813–10821
Language:
URL:
https://aclanthology.org/2024.lrec-main.944
DOI:
Bibkey:
Cite (ACL):
Ulla Petti and Anna Korhonen. 2024. LoSST-AD: A Longitudinal Corpus for Tracking Alzheimer’s Disease Related Changes in Spontaneous Speech. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 10813–10821, Torino, Italia. ELRA and ICCL.
Cite (Informal):
LoSST-AD: A Longitudinal Corpus for Tracking Alzheimer’s Disease Related Changes in Spontaneous Speech (Petti & Korhonen, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.944.pdf