PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

He Cao; Yanjun Shao; Zhiyuan Liu; Zijing Liu; Xiangru Tang; Yuan Yao; Yu Li

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

He Cao, Yanjun Shao, Zhiyuan Liu, Zijing Liu, Xiangru Tang, Yuan Yao, Yu Li

Abstract

Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines. These advancements encourage the investigation of molecule-text modeling within synthetic chemistry, a field dedicated to designing and conducting chemical reactions to synthesize new compounds with desired properties and applications. Current approaches, however, often neglect the critical role of multi-molecule graph interaction in understanding chemical reactions, leading to suboptimal performance in synthetic chemistry tasks. This study introduces PRESTO (Progressive Pretraining Enhances Synthetic Chemistry Outcomes), a new framework that bridges the molecule-text modality gap by integrating a comprehensive benchmark of pretraining strategies and dataset configurations. It progressively improves multimodal LLMs through cross-modal alignment and multi-graph understanding. Our extensive experiments demonstrate that PRESTO offers competitive results in downstream synthetic chemistry tasks. The code can be found at https://github.com/IDEA-XL/PRESTO.

Anthology ID:: 2024.findings-emnlp.597
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10197–10224
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.597
DOI:
Bibkey:
Cite (ACL):: He Cao, Yanjun Shao, Zhiyuan Liu, Zijing Liu, Xiangru Tang, Yuan Yao, and Yu Li. 2024. PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10197–10224, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes (Cao et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.597.pdf

PDF Cite Search