A Survey of Large Language Models for Text-Guided Molecular Discovery: From Molecule Generation to Optimization

Ziqing Wang; Kexin Zhang; Zihan Zhao; Yibo Wen; Abhishek Pandey; Han Liu; Kaize Ding

A Survey of Large Language Models for Text-Guided Molecular Discovery: From Molecule Generation to Optimization

Ziqing Wang, Kexin Zhang, Zihan Zhao, Yibo Wen, Abhishek Pandey, Han Liu, Kaize Ding

Abstract

Large language models (LLMs) are introducing a paradigm shift in molecular discovery by enabling text-guided interaction with chemical spaces through natural language and symbolic notations, with emerging extensions to incorporate multi-modal inputs. To advance this emerging field, this survey provides an up-to-date and forward-looking review of the emerging use of LLMs for two central tasks: molecule generation and molecule optimization. We organize our survey around four fundamental challenges that have emerged as critical evaluation dimensions in recent studies: ensuring validity, enhancing synthesizability, achieving precise property control, and maximizing diversity. Based on this, we systematically analyze how current LLM learning paradigms are applied to tackle each challenge, revealing the distinct capabilities and inherent limitations of each approach. In addition, we include the commonly used datasets and evaluation protocols aligned with these challenges. We conclude by discussing future directions, positioning this survey as a resource for researchers working at the intersection of LLMs and molecular science. A continuously updated reading list is available at https://github.com/REAL-Lab-NU/Awesome-LLM-Centric-Molecular-Discovery.

Anthology ID:: 2026.acl-long.2026
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 43738–43769
Language:
URL:: https://aclanthology.org/2026.acl-long.2026/
DOI:
Bibkey:
Cite (ACL):: Ziqing Wang, Kexin Zhang, Zihan Zhao, Yibo Wen, Abhishek Pandey, Han Liu, and Kaize Ding. 2026. A Survey of Large Language Models for Text-Guided Molecular Discovery: From Molecule Generation to Optimization. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 43738–43769, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: A Survey of Large Language Models for Text-Guided Molecular Discovery: From Molecule Generation to Optimization (Wang et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.2026.pdf
Checklist:: 2026.acl-long.2026.checklist.pdf

PDF Cite Search Checklist Fix data