Prompt-based Generation of Natural Language Explanations of Synthetic Lethality for Cancer Drug Discovery

Ke Zhang, Yimiao Feng, Jie Zheng


Abstract
Synthetic lethality (SL) offers a promising approach for targeted anti-cancer therapy. Deeply understanding SL gene pair mechanisms is vital for anti-cancer drug discovery. However, current wet-lab and machine learning-based SL prediction methods lack user-friendly and quantitatively evaluable explanations. To address these problems, we propose a prompt-based pipeline for generating natural language explanations. We first construct a natural language dataset named NexLeth. This dataset is derived from New Bing through prompt-based queries and expert annotations and contains 707 instances. NexLeth enhances the understanding of SL mechanisms and it is a benchmark for evaluating SL explanation methods. For the task of natural language generation for SL explanations, we combine subgraph explanations from an SL knowledge graph (KG) with instructions to construct novel personalized prompts, so as to inject the domain knowledge into the generation process. We then leverage the prompts to fine-tune pre-trained biomedical language models on our dataset. Experimental results show that the fine-tuned model equipped with designed prompts performs better than existing biomedical language models in terms of text quality and explainability, suggesting the potential of our dataset and the fine-tuned model for generating understandable and reliable explanations of SL mechanisms.
Anthology ID:
2024.lrec-main.1150
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
13131–13142
Language:
URL:
https://aclanthology.org/2024.lrec-main.1150
DOI:
Bibkey:
Cite (ACL):
Ke Zhang, Yimiao Feng, and Jie Zheng. 2024. Prompt-based Generation of Natural Language Explanations of Synthetic Lethality for Cancer Drug Discovery. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13131–13142, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Prompt-based Generation of Natural Language Explanations of Synthetic Lethality for Cancer Drug Discovery (Zhang et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1150.pdf
Optional supplementary material:
 2024.lrec-main.1150.OptionalSupplementaryMaterial.pdf