Pathway2Text: Dataset and Method for Biomedical Pathway Description Generation

Junwei Yang, Zequn Liu, Ming Zhang, Sheng Wang


Abstract
Biomedical pathways have been extensively used to characterize the mechanism of complex diseases. One essential step in biomedical pathway analysis is to curate the description of a pathway based on its graph structure and node features. Neural text generation could be a plausible technique to circumvent the tedious manual curation. In this paper, we propose a new dataset Pathway2Text, which contains 2,367 pairs of biomedical pathways and textual descriptions. All pathway graphs are experimentally derived or manually curated. All textual descriptions are written by domain experts. We form this problem as a Graph2Text task and propose a novel graph-based text generation approach kNN-Graph2Text, which explicitly exploited descriptions of similar graphs to generate new descriptions. We observed substantial improvement of our method on both Graph2Text and the reverse task of Text2Graph. We further illustrated how our dataset can be used as a novel benchmark for biomedical named entity recognition. Collectively, we envision our method will become an important benchmark for evaluating Graph2Text methods and advance biomedical research for complex diseases.
Anthology ID:
2022.findings-naacl.108
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1441–1454
Language:
URL:
https://aclanthology.org/2022.findings-naacl.108
DOI:
10.18653/v1/2022.findings-naacl.108
Bibkey:
Cite (ACL):
Junwei Yang, Zequn Liu, Ming Zhang, and Sheng Wang. 2022. Pathway2Text: Dataset and Method for Biomedical Pathway Description Generation. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1441–1454, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Pathway2Text: Dataset and Method for Biomedical Pathway Description Generation (Yang et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.108.pdf
Video:
 https://aclanthology.org/2022.findings-naacl.108.mp4
Code
 yjwtheonly/pathway2text
Data
Bio