Tracking Semantic Change in Cognate Sets for English and Romance Languages

Ana Sabina Uban, Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Simona Georgescu, Laurentiu Zoicas


Abstract
Semantic divergence in related languages is a key concern of historical linguistics. We cross-linguistically investigate the semantic divergence of cognate pairs in English and Romance languages, by means of word embeddings. To this end, we introduce a new curated dataset of cognates in all pairs of those languages. We describe the types of errors that occurred during the automated cognate identification process and manually correct them. Additionally, we label the English cognates according to their etymology, separating them into two groups: old borrowings and recent borrowings. On this curated dataset, we analyse word properties such as frequency and polysemy, and the distribution of similarity scores between cognate sets in different languages. We automatically identify different clusters of English cognates, setting a new direction of research in cognates, borrowings and possibly false friends analysis in related languages.
Anthology ID:
2021.lchange-1.9
Volume:
Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP | LChange
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
64–74
Language:
URL:
https://aclanthology.org/2021.lchange-1.9
DOI:
10.18653/v1/2021.lchange-1.9
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.lchange-1.9.pdf