2025
pdf
bib
abs
Enhancing Future Link Prediction in Quantum Computing Semantic Networks through LLM-Initiated Node Features
Gilchan Park
|
Paul Baity
|
Byung-Jun Yoon
|
Adolfy Hoisie
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Quantum computing is rapidly evolving in both physics and computer science, offering the potential to solve complex problems and accelerate computational processes. The development of quantum chips necessitates understanding the correlations among diverse experimental conditions. Semantic networks built on scientific literature, representing meaningful relationships between concepts, have been used across various domains to identify knowledge gaps and novel concept combinations. Neural network-based approaches have shown promise in link prediction within these networks. This study proposes initializing node features using LLMs to enhance node representations for link prediction tasks in graph neural networks. LLMs can provide rich descriptions, reducing the need for manual feature creation and lowering costs. Our method, evaluated using various link prediction models on a quantum computing semantic network, demonstrated efficacy compared to traditional node embedding techniques.
2024
pdf
bib
abs
Leveraging LLMs and Web-based Visualizations for Profiling Bacterial Host Organisms and Genetic Toolboxes
Gilchan Park
|
Vivek Mutalik
|
Christopher Neely
|
Carlos Soto
|
Shinjae Yoo
|
Paramvir Dehal
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Building genetic tools to engineer microorganisms is at the core of understanding and redesigning natural biological systems for useful purposes. Every project to build such a genetic toolbox for an organism starts with a survey of available tools. Despite a decade-long investment and advancement in the field, it is still challenging to mine information about a genetic tool published in the literature and connect that information to microbial genomics and other microbial databases. This information gap not only limits our ability to identify and adopt available tools to a new chassis but also conceals available opportunities to engineer a new microbial host. Recent advances in natural language processing (NLP), particularly large language models (LLMs), offer solutions by enabling efficient extraction of genetic terms and biological entities from a vast array of publications. This work present a method to automate this process, using text-mining to refine models with data from bioRxiv and other databases. We evaluated various LLMs to investigate their ability to recognize bacterial host organisms and genetic toolboxes for engineering. We demonstrate our methodology with a web application that integrates a conversational LLM and visualization tool, connecting user inquiries to genetic resources and literature findings, thereby saving researchers time, money and effort in their laboratory work.
pdf
bib
abs
Evaluating Large Language Models for Predicting Protein Behavior under Radiation Exposure and Disease Conditions
Ryan Engel
|
Gilchan Park
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
The primary concern with exposure to ionizing radiation is the risk of developing diseases. While high doses of radiation can cause immediate damage leading to cancer, the effects of low-dose radiation (LDR) are less clear and more controversial. To further investigate this, it necessitates focusing on the underlying biological structures affected by radiation. Recent work has shown that Large Language Models (LLMs) can effectively predict protein structures and other biological properties. The aim of this research is to utilize open-source LLMs, such as Mistral, Llama 2, and Llama 3, to predict both radiation-induced alterations in proteins and the dynamics of protein-protein interactions (PPIs) within the presence of specific diseases. We show that fine-tuning these models yields state-of-the-art performance for predicting protein interactions in the context of neurodegenerative diseases, metabolic disorders, and cancer. Our findings contribute to the ongoing efforts to understand the complex relationships between radiation exposure and disease mechanisms, illustrating the nuanced capabilities and limitations of current computational models.
2023
pdf
bib
abs
Automated Extraction of Molecular Interactions and Pathway Knowledge using Large Language Model, Galactica: Opportunities and Challenges
Gilchan Park
|
Byung-Jun Yoon
|
Xihaier Luo
|
Vanessa Lpez-Marrero
|
Patrick Johnstone
|
Shinjae Yoo
|
Francis Alexander
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
Understanding protein interactions and pathway knowledge is essential for comprehending living systems and investigating the mechanisms underlying various biological functions and complex diseases. While numerous databases curate such biological data obtained from literature and other sources, they are not comprehensive and require considerable effort to maintain. One mitigation strategies can be utilizing large language models to automatically extract biological information and explore their potential in life science research. This study presents an initial investigation of the efficacy of utilizing a large language model, Galactica in life science research by assessing its performance on tasks involving protein interactions, pathways, and gene regulatory relation recognition. The paper details the results obtained from the model evaluation, highlights the findings, and discusses the opportunities and challenges.