Yanjun Shao
2024
PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes
He Cao
|
Yanjun Shao
|
Zhiyuan Liu
|
Zijing Liu
|
Xiangru Tang
|
Yuan Yao
|
Yu Li
Findings of the Association for Computational Linguistics: EMNLP 2024
Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines. These advancements encourage the investigation of molecule-text modeling within synthetic chemistry, a field dedicated to designing and conducting chemical reactions to synthesize new compounds with desired properties and applications. Current approaches, however, often neglect the critical role of multi-molecule graph interaction in understanding chemical reactions, leading to suboptimal performance in synthetic chemistry tasks. This study introduces PRESTO (Progressive Pretraining Enhances Synthetic Chemistry Outcomes), a new framework that bridges the molecule-text modality gap by integrating a comprehensive benchmark of pretraining strategies and dataset configurations. It progressively improves multimodal LLMs through cross-modal alignment and multi-graph understanding. Our extensive experiments demonstrate that PRESTO offers competitive results in downstream synthetic chemistry tasks. The code can be found at https://github.com/IDEA-XL/PRESTO.
Search
Co-authors
- He Cao 1
- Zhiyuan Liu 1
- Zijing Liu 1
- Xiangru Tang 1
- Yuan Yao 1
- show all...
- Yu Li 1