GAMIC: Graph-Aligned Molecular In-context Learning for Molecule Analysis via LLMs

Ali Al Lawati, Jason S Lucas, Zhiwei Zhang, Prasenjit Mitra, Suhang Wang


Abstract
In-context learning (ICL) effectively conditions large language models (LLMs) for molecular tasks, such as property prediction and molecule captioning, by embedding carefully selected demonstration examples into the input prompt. This approach eliminates the computational overhead of extensive pre-training and fine-tuning. However, current prompt retrieval methods for molecular tasks rely on molecule feature similarity, such as Morgan fingerprints, which do not adequately capture the global molecular and atom-binding relationships. As a result, these methods fail to represent the full complexity of molecular structures during inference. Moreover, medium-sized LLMs, which offer simpler deployment requirements in specialized systems, have remained largely unexplored in the molecular ICL literature. To address these gaps, we propose a self-supervised learning technique, GAMIC (Graph-Aligned Molecular In-Context learning), which aligns global molecular structures, represented by graph neural networks (GNNs), with textual captions (descriptions) while leveraging local feature similarity through Morgan fingerprints. In addition, we introduce a Maximum Marginal Relevance (MMR) based diversity heuristic during retrieval to optimize input prompt demonstration samples. Our experimental findings using diverse benchmark datasets show GAMIC outperforms simple Morgan-based ICL retrieval methods across all tasks by up to 45%. Our code is available at: https://github.com/aliwister/mol-icl.
Anthology ID:
2025.findings-emnlp.996
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18364–18378
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.996/
DOI:
Bibkey:
Cite (ACL):
Ali Al Lawati, Jason S Lucas, Zhiwei Zhang, Prasenjit Mitra, and Suhang Wang. 2025. GAMIC: Graph-Aligned Molecular In-context Learning for Molecule Analysis via LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 18364–18378, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
GAMIC: Graph-Aligned Molecular In-context Learning for Molecule Analysis via LLMs (Al Lawati et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.996.pdf
Checklist:
 2025.findings-emnlp.996.checklist.pdf