ÚFAL Submission for SIGTYP Supervised Cognate Detection Task

Tomasz Limisiewicz


Abstract
In this work, I present ÚFAL submission for the supervised task of detecting cognates and derivatives. Cognates are word pairs in different languages sharing the origin in earlier attested forms in ancestral language, while derivatives come directly from another language. For the task, I developed gradient boosted tree classifier trained on linguistic and statistical features. The solution came first from two delivered systems with an 87% F1 score on the test split. This write-up gives an insight into the system and shows the importance of using linguistic features and character-level statistics for the task.
Anthology ID:
2023.sigtyp-1.14
Volume:
Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Lisa Beinborn, Koustava Goswami, Saliha Muradoğlu, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Edoardo M. Ponti, Ryan Cotterell, Ekaterina Vylomova
Venue:
SIGTYP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
132–136
Language:
URL:
https://aclanthology.org/2023.sigtyp-1.14
DOI:
10.18653/v1/2023.sigtyp-1.14
Bibkey:
Cite (ACL):
Tomasz Limisiewicz. 2023. ÚFAL Submission for SIGTYP Supervised Cognate Detection Task. In Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 132–136, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
ÚFAL Submission for SIGTYP Supervised Cognate Detection Task (Limisiewicz, SIGTYP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.sigtyp-1.14.pdf
Video:
 https://aclanthology.org/2023.sigtyp-1.14.mp4