Jacqueline C.k. Lam


2025

pdf bib
An LLM-based Temporal-spatial Data Generation and Fusion Approach for Early Detection of Late Onset Alzheimer’s Disease (LOAD) Stagings Especially in Chinese and English-speaking Populations
Yang Han | Jacqueline C.k. Lam | Victor O.k. Li | Lawrence Y. L. Cheung
Findings of the Association for Computational Linguistics: EMNLP 2025

Alzheimer’s Disease (AD), the 7th leading cause of death globally, demands scalable methods for early detection. While speech-based diagnostics offer promise, existing approaches struggle with temporal-spatial (T-S) challenges in capturing subtle linguistic shifts across different disease stages (temporal) and in adapting to cross-linguistic variability (spatial). This study introduces a novel Large Language Model (LLM)-driven T-S fusion framework that integrates multilingual LLMs, contrastive learning, and interpretable marker discovery to revolutionize Late Onset AD (LOAD) detection. Our key innovations include: (1) T-S Data Imputation: Leveraging LLMs to generate synthetic speech transcripts across different LOAD stages (NC, Normal Control; eMCI, early Mild Cognitive Impairment; lMCI, late Mild Cognitive Impairment; AD) and languages (Chinese, English, Spanish), addressing data scarcity while preserving clinical relevance (expert validation: 86% agreement with LLM-generated labels). (2) T-S Transformer with Contrastive Learning: A multilingual model that disentangles stage-specific (temporal) and language-specific (spatial) patterns, achieving a notable improvement of 10.9–24.7% in F1-score over existing baselines. (3) Cross-Linguistic Marker Discovery: Identifying language-agnostic markers and language-specific patterns to enhance interpretability for clinical adoption. By unifying temporal LOAD stages and spatial diversity, our framework achieves state-of-the-art performance in early LOAD detection while enabling cross-linguistic diagnostics. This study bridges NLP and clinical neuroscience, demonstrating LLMs’ potential to amplify limited biomedical data and advance equitable healthcare AI.