Vy Phan
2023
Enriching Biomedical Knowledge for Low-resource Language Through Large-scale Translation
Long Phan
|
Tai Dang
|
Hieu Tran
|
Trieu H. Trinh
|
Vy Phan
|
Lam D. Chau
|
Minh-Thang Luong
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English, such as Vietnamese. In this paper, we use a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained and supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20 million translated abstracts from the high-quality public PubMed corpus. ViPubMedT5 demonstrates state-of-the-art results on two different biomedical benchmarks in summarization and acronym disambiguation. Further, we release ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the recently public En-vi translation model and carefully refined by human experts, with evaluations of existing methods against ViPubmedT5.
Search
Co-authors
- Long Phan 1
- Tai Dang 1
- Hieu Tran 1
- Trieu H. Trinh 1
- Lam D. Chau 1
- show all...
Venues
- eacl1