Active Knowledge Structuring for Large Language Models in Materials Science Text Mining

Xin Zhang; Jingling Yuan; Peiliang Zhang; Jia Liu; Lin Li

doi:10.1162/tacl.a.36

Active Knowledge Structuring for Large Language Models in Materials Science Text Mining

Xin Zhang, Jingling Yuan, Peiliang Zhang, Jia Liu, Lin Li

Abstract

Large Language Models (LLMs) offer a promising alternative to traditional Materials Science Text Mining (MSTM) by reducing the need for extensive data labeling and fine-tuning. However, existing zero-/few-shot methods still face limitations in aligning with personalized needs in scientific discovery. To address this, we propose ClassMATe, an active knowledge structuring approach for MSTM. Specifically, we first propose a class definition stylization method to structure knowledge, enabling explicit clustering of latent material knowledge in LLMs for enhanced inference. To align with the scientists’ needs, we propose an active needs refining strategy that iteratively clarifies needs by learning from uncertainty-aware hard samples of LLMs, further refining the knowledge structuring. Extensive experiments on seven tasks and eight datasets show that ClassMATe, as a plug-and-play method, achieves performance comparable to supervised learning without requiring fine-tuning or extra knowledge base, highlighting the potential to bridge the gap between LLMs’ latent knowledge and real-world scientific applications.1

Anthology ID:: 2025.tacl-1.55
Volume:: Transactions of the Association for Computational Linguistics, Volume 13
Month:
Year:: 2025
Address:: Cambridge, MA
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 1186–1203
Language:
URL:: https://aclanthology.org/2025.tacl-1.55/
DOI:: 10.1162/tacl.a.36
Bibkey:
Cite (ACL):: Xin Zhang, Jingling Yuan, Peiliang Zhang, Jia Liu, and Lin Li. 2025. Active Knowledge Structuring for Large Language Models in Materials Science Text Mining. Transactions of the Association for Computational Linguistics, 13:1186–1203.
Cite (Informal):: Active Knowledge Structuring for Large Language Models in Materials Science Text Mining (Zhang et al., TACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.tacl-1.55.pdf

PDF Cite Search Fix data