Leveraging LLMs and Web-based Visualizations for Profiling Bacterial Host Organisms and Genetic Toolboxes

Gilchan Park, Vivek Mutalik, Christopher Neely, Carlos Soto, Shinjae Yoo, Paramvir Dehal


Abstract
Building genetic tools to engineer microorganisms is at the core of understanding and redesigning natural biological systems for useful purposes. Every project to build such a genetic toolbox for an organism starts with a survey of available tools. Despite a decade-long investment and advancement in the field, it is still challenging to mine information about a genetic tool published in the literature and connect that information to microbial genomics and other microbial databases. This information gap not only limits our ability to identify and adopt available tools to a new chassis but also conceals available opportunities to engineer a new microbial host. Recent advances in natural language processing (NLP), particularly large language models (LLMs), offer solutions by enabling efficient extraction of genetic terms and biological entities from a vast array of publications. This work present a method to automate this process, using text-mining to refine models with data from bioRxiv and other databases. We evaluated various LLMs to investigate their ability to recognize bacterial host organisms and genetic toolboxes for engineering. We demonstrate our methodology with a web application that integrates a conversational LLM and visualization tool, connecting user inquiries to genetic resources and literature findings, thereby saving researchers time, money and effort in their laboratory work.
Anthology ID:
2024.bionlp-1.28
Volume:
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
370–379
Language:
URL:
https://aclanthology.org/2024.bionlp-1.28
DOI:
Bibkey:
Cite (ACL):
Gilchan Park, Vivek Mutalik, Christopher Neely, Carlos Soto, Shinjae Yoo, and Paramvir Dehal. 2024. Leveraging LLMs and Web-based Visualizations for Profiling Bacterial Host Organisms and Genetic Toolboxes. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, pages 370–379, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Leveraging LLMs and Web-based Visualizations for Profiling Bacterial Host Organisms and Genetic Toolboxes (Park et al., BioNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.bionlp-1.28.pdf