Elleni Sisay Temesgen
2025
Cognate Detection for Historical Language Reconstruction of Proto-Sabean Languages: the Case of Ge’ez, Tigrinya, and Amharic
Elleni Sisay Temesgen
|
Hellina Hailu Nigatu
|
Fitsum Assamnew Andargie
Proceedings of the 31st International Conference on Computational Linguistics
As languages evolve, we risk losing ancestral languages. In this paper, we explore Historical Language Reconstruction (HLR) for Proto-Sabean languages, starting with the identification of cognates–sets of words in different related languages that are derived from the same ancestral language. We (1) collect semantically related words in three Afro-Semitic languages from a three-way dictionary (2) work with linguists to identify cognates and reconstruct the proto-form of the cognates, (3) experiment with three automatic cognate detection methods and extract cognates from the semantically related words. We then experiment with in-context learning with GPT-4o to generate the proto-language from the cognates and use Sequence-to-Sequence (Seq2Seq) models for HLR.