Moleco: Molecular Contrastive Learning with Chemical Language Models for Molecular Property Prediction

Jun-Hyung Park, Hyuntae Park, Yeachan Kim, Woosang Lim, SangKeun Lee


Abstract
Pre-trained chemical language models (CLMs) excel in the field of molecular property prediction, utilizing string-based molecular descriptors such as SMILES for learning universal representations. However, such string-based descriptors implicitly contain limited structural information, which is closely associated with molecular property prediction. In this work, we introduce Moleco, a novel contrastive learning framework to enhance the understanding of molecular structures within CLMs. Based on the similarity of fingerprint vectors among different molecules, we train CLMs to distinguish structurally similar and dissimilar molecules in a contrastive manner. Experimental results demonstrate that Moleco significantly improves the molecular property prediction performance of CLMs, outperforming state-of-the-art models. Moreover, our in-depth analysis with diverse Moleco variants verifies that fingerprint vectors are highly effective features in improving CLMs’ understanding of the structural information of molecules.
Anthology ID:
2024.emnlp-industry.30
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
408–420
Language:
URL:
https://aclanthology.org/2024.emnlp-industry.30
DOI:
Bibkey:
Cite (ACL):
Jun-Hyung Park, Hyuntae Park, Yeachan Kim, Woosang Lim, and SangKeun Lee. 2024. Moleco: Molecular Contrastive Learning with Chemical Language Models for Molecular Property Prediction. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 408–420, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
Moleco: Molecular Contrastive Learning with Chemical Language Models for Molecular Property Prediction (Park et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-industry.30.pdf
Poster:
 2024.emnlp-industry.30.poster.pdf
Presentation:
 2024.emnlp-industry.30.presentation.pdf