ViHealthNLI: A Dataset for Vietnamese Natural Language Inference in Healthcare

Huyen Nguyen, Quyen The Ngo, Thanh-Ha Do, Tuan-Anh Hoang


Abstract
This paper introduces ViHealthNLI, a large dataset for the natural language inference problem for Vietnamese. Unlike the similar Vietnamese datasets, ours is specific to the healthcare domain. We conducted an exploratory analysis to characterize the dataset and evaluated the state-of-the-art methods on the dataset. Our findings indicate that the dataset poses significant challenges while also holding promise for further advanced research and the creation of practical applications.
Anthology ID:
2024.sigul-1.48
Volume:
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venues:
SIGUL | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
404–409
Language:
URL:
https://aclanthology.org/2024.sigul-1.48
DOI:
Bibkey:
Cite (ACL):
Huyen Nguyen, Quyen The Ngo, Thanh-Ha Do, and Tuan-Anh Hoang. 2024. ViHealthNLI: A Dataset for Vietnamese Natural Language Inference in Healthcare. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 404–409, Torino, Italia. ELRA and ICCL.
Cite (Informal):
ViHealthNLI: A Dataset for Vietnamese Natural Language Inference in Healthcare (Nguyen et al., SIGUL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sigul-1.48.pdf