Enriching Biomedical Knowledge for Low-resource Language Through Large-scale Translation

Long Phan, Tai Dang, Hieu Tran, Trieu H. Trinh, Vy Phan, Lam D. Chau, Minh-Thang Luong


Abstract
Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English, such as Vietnamese. In this paper, we use a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained and supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20 million translated abstracts from the high-quality public PubMed corpus. ViPubMedT5 demonstrates state-of-the-art results on two different biomedical benchmarks in summarization and acronym disambiguation. Further, we release ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the recently public En-vi translation model and carefully refined by human experts, with evaluations of existing methods against ViPubmedT5.
Anthology ID:
2023.eacl-main.228
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3131–3142
Language:
URL:
https://aclanthology.org/2023.eacl-main.228
DOI:
10.18653/v1/2023.eacl-main.228
Bibkey:
Cite (ACL):
Long Phan, Tai Dang, Hieu Tran, Trieu H. Trinh, Vy Phan, Lam D. Chau, and Minh-Thang Luong. 2023. Enriching Biomedical Knowledge for Low-resource Language Through Large-scale Translation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3131–3142, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Enriching Biomedical Knowledge for Low-resource Language Through Large-scale Translation (Phan et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.228.pdf
Video:
 https://aclanthology.org/2023.eacl-main.228.mp4