Identifying Cellular Niches in Spatial Transcriptomics: An Investigation into the Capabilities of Large Language Models

Huanhuan Wei; Xiao Luo; Hongyi Yu; Jinping Liang; Luning Yang; Lixing Lin; Alexandra Popa; Xiting Yan

doi:10.18653/v1/2025.acl-long.455

Identifying Cellular Niches in Spatial Transcriptomics: An Investigation into the Capabilities of Large Language Models

Huanhuan Wei, Xiao Luo, Hongyi Yu, Jinping Liang, Luning Yang, Lixing Lin, Alexandra Popa, Xiting Yan

Abstract

Spatial transcriptomic technologies enable measuring gene expression profile and spatial information of cells in tissues simultaneously. Clustering of captured cells/spots in the spatial transcriptomic data is crucial for understanding tissue niches and uncovering disease-related changes.Current methods to cluster spatial transcriptomic data encounter obstacles, including inefficiency in handling multi-replicate data, lack of prior knowledge incorporation, and producing uninterpretable cluster labels.We introduce a novel approach, LLMiniST, to identify spatial niche using a zero-shot large language models (LLMs) by transforming spatial transcriptomic data into spatial context prompts, leveraging gene expression of neighboring cells/spots, cell type composition, tissue information, and external knowledge. The model was further enhanced using a two-stage fine-tuning strategy for improved generalizability. We also develop a user-friendly annotation tool to accelerate the creation of well-annotated spatial dataset for fine-tuning.Comprehensive method performance evaluations showed that both zero-shot and fine-tunned LLMiniST had superior performance than current non-LLM methods in many circumstances. Notably, the two-stage fine-tuning strategy facilitated substantial cross-subject generalizability. The results demonstrate the feasibility of LLMs for tissue niche identification using spatial transcriptomic data and the potential of LLMs as a scalable solution to efficiently integrate minimal human guidance for improved performance in large-scale datasets.

Anthology ID:: 2025.acl-long.455
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9275–9289
Language:
URL:: https://aclanthology.org/2025.acl-long.455/
DOI:: 10.18653/v1/2025.acl-long.455
Bibkey:
Cite (ACL):: Huanhuan Wei, Xiao Luo, Hongyi Yu, Jinping Liang, Luning Yang, Lixing Lin, Alexandra Popa, and Xiting Yan. 2025. Identifying Cellular Niches in Spatial Transcriptomics: An Investigation into the Capabilities of Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9275–9289, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Identifying Cellular Niches in Spatial Transcriptomics: An Investigation into the Capabilities of Large Language Models (Wei et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.455.pdf

PDF Cite Search Fix data