SEED: Semantic Knowledge Transfer for Language Model Adaptation to Materials Science

Yeachan Kim, Jun-Hyung Park, SungHo Kim, Juhyeong Park, Sangyun Kim, SangKeun Lee


Abstract
Materials science is an interdisciplinary field focused on studying and discovering materials around us. However, due to the vast space of materials, datasets in this field are typically scarce and have limited coverage. This inherent limitation makes current adaptation methods less effective when adapting pre-trained language models (PLMs) to materials science, as these methods rely heavily on the frequency information from limited downstream datasets. In this paper, we propose Semantic Knowledge Transfer (SEED), a novel vocabulary expansion method to adapt the pre-trained language models for materials science. The core strategy of SEED is to transfer the materials knowledge of lightweight embeddings into the PLMs. To this end, we introduce knowledge bridge networks, which learn to transfer the latent knowledge of the materials embeddings into ones compatible with PLMs. By expanding the embedding layer of PLMs with these transformed embeddings, PLMs can comprehensively understand the complex terminology associated with materials science. We conduct extensive experiments across a broad range of materials-related benchmarks. Comprehensive evaluation results convincingly demonstrate that SEED mitigates the mentioned limitations of previous adaptation methods, showcasing the efficacy of transferring embedding knowledge into PLMs.
Anthology ID:
2024.emnlp-industry.31
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
421–428
Language:
URL:
https://aclanthology.org/2024.emnlp-industry.31
DOI:
Bibkey:
Cite (ACL):
Yeachan Kim, Jun-Hyung Park, SungHo Kim, Juhyeong Park, Sangyun Kim, and SangKeun Lee. 2024. SEED: Semantic Knowledge Transfer for Language Model Adaptation to Materials Science. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 421–428, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
SEED: Semantic Knowledge Transfer for Language Model Adaptation to Materials Science (Kim et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-industry.31.pdf
Poster:
 2024.emnlp-industry.31.poster.pdf