Cognate Detection for Historical Language Reconstruction of Proto-Sabean Languages: the Case of Ge’ez, Tigrinya, and Amharic

Elleni Sisay Temesgen, Hellina Hailu Nigatu, Fitsum Assamnew Andargie


Abstract
As languages evolve, we risk losing ancestral languages. In this paper, we explore Historical Language Reconstruction (HLR) for Proto-Sabean languages, starting with the identification of cognates–sets of words in different related languages that are derived from the same ancestral language. We (1) collect semantically related words in three Afro-Semitic languages from a three-way dictionary (2) work with linguists to identify cognates and reconstruct the proto-form of the cognates, (3) experiment with three automatic cognate detection methods and extract cognates from the semantically related words. We then experiment with in-context learning with GPT-4o to generate the proto-language from the cognates and use Sequence-to-Sequence (Seq2Seq) models for HLR.
Anthology ID:
2025.coling-main.496
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7415–7422
Language:
URL:
https://aclanthology.org/2025.coling-main.496/
DOI:
Bibkey:
Cite (ACL):
Elleni Sisay Temesgen, Hellina Hailu Nigatu, and Fitsum Assamnew Andargie. 2025. Cognate Detection for Historical Language Reconstruction of Proto-Sabean Languages: the Case of Ge’ez, Tigrinya, and Amharic. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7415–7422, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Cognate Detection for Historical Language Reconstruction of Proto-Sabean Languages: the Case of Ge’ez, Tigrinya, and Amharic (Temesgen et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.496.pdf