Introducing Two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness

Kim-Anh Nguyen; Sabine Schulte im Walde; Ngoc Thang Vu

doi:10.18653/v1/N18-2032

Introducing Two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness

Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu

Abstract

We present two novel datasets for the low-resource language Vietnamese to assess models of semantic similarity: ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges. The two datasets are verified through standard co-occurrence and neural network models, showing results comparable to the respective English datasets.

Anthology ID:: N18-2032
Volume:: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Month:: June
Year:: 2018
Address:: New Orleans, Louisiana
Editors:: Marilyn Walker, Heng Ji, Amanda Stent
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 199–205
Language:
URL:: https://aclanthology.org/N18-2032/
DOI:: 10.18653/v1/N18-2032
Bibkey:
Cite (ACL):: Kim Anh Nguyen, Sabine Schulte im Walde, and Ngoc Thang Vu. 2018. Introducing Two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 199–205, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):: Introducing Two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness (Nguyen et al., NAACL 2018)
Copy Citation:
PDF:: https://aclanthology.org/N18-2032.pdf

PDF Cite Search Fix data