An LLM-based Temporal-spatial Data Generation and Fusion Approach for Early Detection of Late Onset Alzheimer’s Disease (LOAD) Stagings Especially in Chinese and English-speaking Populations

Yang Han; Jacqueline C.k. Lam; Victor O.k. Li; Lawrence Y. L. Cheung

doi:10.18653/v1/2025.findings-emnlp.809

An LLM-based Temporal-spatial Data Generation and Fusion Approach for Early Detection of Late Onset Alzheimer’s Disease (LOAD) Stagings Especially in Chinese and English-speaking Populations

Yang Han, Jacqueline C.k. Lam, Victor O.k. Li, Lawrence Y. L. Cheung

Abstract

Alzheimer’s Disease (AD), the 7th leading cause of death globally, demands scalable methods for early detection. While speech-based diagnostics offer promise, existing approaches struggle with temporal-spatial (T-S) challenges in capturing subtle linguistic shifts across different disease stages (temporal) and in adapting to cross-linguistic variability (spatial). This study introduces a novel Large Language Model (LLM)-driven T-S fusion framework that integrates multilingual LLMs, contrastive learning, and interpretable marker discovery to revolutionize Late Onset AD (LOAD) detection. Our key innovations include: (1) T-S Data Imputation: Leveraging LLMs to generate synthetic speech transcripts across different LOAD stages (NC, Normal Control; eMCI, early Mild Cognitive Impairment; lMCI, late Mild Cognitive Impairment; AD) and languages (Chinese, English, Spanish), addressing data scarcity while preserving clinical relevance (expert validation: 86% agreement with LLM-generated labels). (2) T-S Transformer with Contrastive Learning: A multilingual model that disentangles stage-specific (temporal) and language-specific (spatial) patterns, achieving a notable improvement of 10.9–24.7% in F1-score over existing baselines. (3) Cross-Linguistic Marker Discovery: Identifying language-agnostic markers and language-specific patterns to enhance interpretability for clinical adoption. By unifying temporal LOAD stages and spatial diversity, our framework achieves state-of-the-art performance in early LOAD detection while enabling cross-linguistic diagnostics. This study bridges NLP and clinical neuroscience, demonstrating LLMs’ potential to amplify limited biomedical data and advance equitable healthcare AI.

Anthology ID:: 2025.findings-emnlp.809
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14977–14990
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.809/
DOI:: 10.18653/v1/2025.findings-emnlp.809
Bibkey:
Cite (ACL):: Yang Han, Jacqueline C.k. Lam, Victor O.k. Li, and Lawrence Y. L. Cheung. 2025. An LLM-based Temporal-spatial Data Generation and Fusion Approach for Early Detection of Late Onset Alzheimer’s Disease (LOAD) Stagings Especially in Chinese and English-speaking Populations. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 14977–14990, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: An LLM-based Temporal-spatial Data Generation and Fusion Approach for Early Detection of Late Onset Alzheimer’s Disease (LOAD) Stagings Especially in Chinese and English-speaking Populations (Han et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.809.pdf
Checklist:: 2025.findings-emnlp.809.checklist.pdf

PDF Cite Search Checklist Fix data