Evaluating Extrapolation Ability of Large Language Model in Chemical Domain

Taehun Cha, Donghun Lee


Abstract
Solving a problem outside the training space, i.e. extrapolation, has been a long problem in the machine learning community. The current success of large language models demonstrates the LLM’s extrapolation ability to several unseen tasks. In line with these works, we evaluate the LLM”s extrapolation ability in the chemical domain. We construct a data set measuring the material properties of epoxy polymers depending on various raw materials and curing processes. LLM should predict the material property when novel raw material is introduced utilizing its chemical knowledge. Through experiments, LLM tends to choose the right direction of adjustment but fails to determine the exact degree, resulting in poor MAE on some properties. But LLM can successfully adjust the degree with only a one-shot example. The results show that LLM can extrapolate to new unseen material utilizing its chemical knowledge learned through massive pre-training.
Anthology ID:
2024.langmol-1.4
Volume:
Proceedings of the 1st Workshop on Language + Molecules (L+M 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Carl Edwards, Qingyun Wang, Manling Li, Lawrence Zhao, Tom Hope, Heng Ji
Venues:
LangMol | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28–33
Language:
URL:
https://aclanthology.org/2024.langmol-1.4
DOI:
Bibkey:
Cite (ACL):
Taehun Cha and Donghun Lee. 2024. Evaluating Extrapolation Ability of Large Language Model in Chemical Domain. In Proceedings of the 1st Workshop on Language + Molecules (L+M 2024), pages 28–33, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Evaluating Extrapolation Ability of Large Language Model in Chemical Domain (Cha & Lee, LangMol-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.langmol-1.4.pdf