CogNet: A Large-Scale Cognate Database

Khuyagbaatar Batsuren, Gabor Bella, Fausto Giunchiglia


Abstract
This paper introduces CogNet, a new, large-scale lexical database that provides cognates -words of common origin and meaning- across languages. The database currently contains 3.1 million cognate pairs across 338 languages using 35 writing systems. The paper also describes the automated method by which cognates were computed from publicly available wordnets, with an accuracy evaluated to 94%. Finally, it presents statistics about the cognate data and some initial insights into it, hinting at a possible future exploitation of the resource by various fields of lingustics.
Anthology ID:
P19-1302
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3136–3145
Language:
URL:
https://aclanthology.org/P19-1302/
DOI:
10.18653/v1/P19-1302
Bibkey:
Cite (ACL):
Khuyagbaatar Batsuren, Gabor Bella, and Fausto Giunchiglia. 2019. CogNet: A Large-Scale Cognate Database. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3136–3145, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
CogNet: A Large-Scale Cognate Database (Batsuren et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1302.pdf
Software:
 P19-1302.Software.zip
Code
 kbatsuren/cognet