Automated Molecular Concept Generation and Labeling with Large Language Models

Zimin Zhang; Qianli Wu; Botao Xia; Fang Sun; Ziniu Hu; Yizhou Sun; Shichang Zhang

Automated Molecular Concept Generation and Labeling with Large Language Models

Zimin Zhang, Qianli Wu, Botao Xia, Fang Sun, Ziniu Hu, Yizhou Sun, Shichang Zhang

Abstract

Artificial intelligence (AI) is transforming scientific research, with explainable AI methods like concept-based models (CMs) showing promise for new discoveries. However, in molecular science, CMs are less common than black-box models like Graph Neural Networks (GNNs), due to their need for predefined concepts and manual labeling. This paper introduces the Automated Molecular Concept (AutoMolCo) framework, which leverages Large Language Models (LLMs) to automatically generate and label predictive molecular concepts. Through iterative concept refinement, AutoMolCo enables simple linear models to outperform GNNs and LLM in-context learning on several benchmarks. The framework operates without human knowledge input, overcoming limitations of existing CMs while maintaining explainability and allowing easy intervention. Experiments on MoleculeNet and High-Throughput Experimentation (HTE) datasets demonstrate that AutoMolCoinduced explainable CMs are beneficial for molecular science research.

Anthology ID:: 2025.coling-main.462
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6918–6936
Language:
URL:: https://aclanthology.org/2025.coling-main.462/
DOI:
Bibkey:
Cite (ACL):: Zimin Zhang, Qianli Wu, Botao Xia, Fang Sun, Ziniu Hu, Yizhou Sun, and Shichang Zhang. 2025. Automated Molecular Concept Generation and Labeling with Large Language Models. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6918–6936, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Automated Molecular Concept Generation and Labeling with Large Language Models (Zhang et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.462.pdf

PDF Cite Search Fix data