Biomedical Data-to-Text Generation via Fine-Tuning Transformers

Ruslan Yermakov, Nicholas Drago, Angelo Ziletti


Abstract
Data-to-text (D2T) generation in the biomedical domain is a promising - yet mostly unexplored - field of research. Here, we apply neural models for D2T generation to a real-world dataset consisting of package leaflets of European medicines. We show that fine-tuned transformers are able to generate realistic, multi-sentence text from data in the biomedical domain, yet have important limitations. We also release a new dataset (BioLeaflets) for benchmarking D2T generation models in the biomedical domain.
Anthology ID:
2021.inlg-1.40
Volume:
Proceedings of the 14th International Conference on Natural Language Generation
Month:
August
Year:
2021
Address:
Aberdeen, Scotland, UK
Editors:
Anya Belz, Angela Fan, Ehud Reiter, Yaji Sripada
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
364–370
Language:
URL:
https://aclanthology.org/2021.inlg-1.40
DOI:
10.18653/v1/2021.inlg-1.40
Bibkey:
Cite (ACL):
Ruslan Yermakov, Nicholas Drago, and Angelo Ziletti. 2021. Biomedical Data-to-Text Generation via Fine-Tuning Transformers. In Proceedings of the 14th International Conference on Natural Language Generation, pages 364–370, Aberdeen, Scotland, UK. Association for Computational Linguistics.
Cite (Informal):
Biomedical Data-to-Text Generation via Fine-Tuning Transformers (Yermakov et al., INLG 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.inlg-1.40.pdf
Code
 bayer-science-for-a-better-life/data2text-bioleaflets
Data
BioLeaflets