Automatic Reconstruction of Missing Romanian Cognates and Unattested Latin Words

Alina Maria Ciobanu, Liviu P. Dinu, Laurentiu Zoicas


Abstract
Producing related words is a key concern in historical linguistics. Given an input word, the task is to automatically produce either its proto-word, a cognate pair or a modern word derived from it. In this paper, we apply a method for producing related words based on sequence labeling, aiming to fill in the gaps in incomplete cognate sets in Romance languages with Latin etymology (producing Romanian cognates that are missing) and to reconstruct uncertified Latin words. We further investigate an ensemble-based aggregation for combining and re-ranking the word productions of multiple languages.
Anthology ID:
2020.lrec-1.394
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3226–3231
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.394
DOI:
Bibkey:
Cite (ACL):
Alina Maria Ciobanu, Liviu P. Dinu, and Laurentiu Zoicas. 2020. Automatic Reconstruction of Missing Romanian Cognates and Unattested Latin Words. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3226–3231, Marseille, France. European Language Resources Association.
Cite (Informal):
Automatic Reconstruction of Missing Romanian Cognates and Unattested Latin Words (Ciobanu et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.394.pdf