Verba volant, scripta volant? Don’t worry! There are computational solutions for protoword reconstruction

Liviu P Dinu, Ana Sabina Uban, Alina Maria Cristea, Ioan-Bogdan Iordache, Teodor-George Marchitan, Simona Georgescu, Laurentiu Zoicas


Abstract
We introduce a new database of cognate words and etymons for the five main Romance languages, the most comprehensive one to date. We propose a strong benchmark for the automatic reconstruction of protowords for Romance languages, by applying a set of machine learning models and features on these data. The best results reach 90% accuracy in predicting the protoword of a given cognate set, surpassing existing state-of-the-art results for this task and showing that computational methods can be very useful in assisting linguists with protoword reconstruction.
Anthology ID:
2024.emnlp-main.362
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6314–6326
Language:
URL:
https://aclanthology.org/2024.emnlp-main.362
DOI:
10.18653/v1/2024.emnlp-main.362
Bibkey:
Cite (ACL):
Liviu P Dinu, Ana Sabina Uban, Alina Maria Cristea, Ioan-Bogdan Iordache, Teodor-George Marchitan, Simona Georgescu, and Laurentiu Zoicas. 2024. Verba volant, scripta volant? Don’t worry! There are computational solutions for protoword reconstruction. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 6314–6326, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Verba volant, scripta volant? Don’t worry! There are computational solutions for protoword reconstruction (Dinu et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.362.pdf
Data:
 2024.emnlp-main.362.data.zip