Shichang Zhang
2025
Automated Molecular Concept Generation and Labeling with Large Language Models
Zimin Zhang
|
Qianli Wu
|
Botao Xia
|
Fang Sun
|
Ziniu Hu
|
Yizhou Sun
|
Shichang Zhang
Proceedings of the 31st International Conference on Computational Linguistics
Artificial intelligence (AI) is transforming scientific research, with explainable AI methods like concept-based models (CMs) showing promise for new discoveries. However, in molecular science, CMs are less common than black-box models like Graph Neural Networks (GNNs), due to their need for predefined concepts and manual labeling. This paper introduces the Automated Molecular Concept (AutoMolCo) framework, which leverages Large Language Models (LLMs) to automatically generate and label predictive molecular concepts. Through iterative concept refinement, AutoMolCo enables simple linear models to outperform GNNs and LLM in-context learning on several benchmarks. The framework operates without human knowledge input, overcoming limitations of existing CMs while maintaining explainability and allowing easy intervention. Experiments on MoleculeNet and High-Throughput Experimentation (HTE) datasets demonstrate that AutoMolCoinduced explainable CMs are beneficial for molecular science research.
2024
FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion
Fred Xu
|
Song Jiang
|
Zijie Huang
|
Xiao Luo
|
Shichang Zhang
|
Yuanzhou Chen
|
Yizhou Sun
Findings of the Association for Computational Linguistics: ACL 2024
Taxonomy Expansion, which relies on modeling concepts and concept relations, can be formulated as a set representation learning task. The generalization of set, fuzzy set, incorporates uncertainty and measures the information within a semantic concept, making it suitable for concept modeling. Existing works usually model sets as vectors or geometric objects such as boxes, which are not closed under set operations. In this work, we propose a sound and efficient formulation of set representation learning based on its volume approximation as a fuzzy set. The resulting embedding framework, Fuzzy Set Embedding, satisfies all set operations and compactly approximates the underlying fuzzy set, hence preserving information while being efficient to learn, relying on minimum neural architecture. We empirically demonstrate the power of FUSE on the task of taxonomy expansion, where FUSE achieves remarkable improvements up to 23% compared with existing baselines. Our work marks the first attempt to understand and efficiently compute the embeddings of fuzzy sets.