Opening the Romance Verbal Inflection Dataset 2.0: A CLDF lexicon

Sacha Beniamine, Martin Maiden, Erich Round


Abstract
We introduce the Romance Verbal Inflection Dataset 2.0, a multilingual lexicon of Romance inflection covering 74 varieties. The lexicon provides verbal paradigm forms in broad IPA phonemic notation. Both lexemes and paradigm cells are organized to reflect cognacy. Such multi-lingual inflected lexicons annotated for two dimensions of cognacy are necessary to study the evolution of inflectional paradigms, and test linguistic hypotheses systematically. However, these resources seldom exist, and when they do, they are not usually encoded in computationally usable ways. The Oxford Online Database of Romance Verb Morphology provides this kind of information, however, it is not maintained anymore and is only available as a web service without interfaces for machine-readability. We collect its data and clean and correct it for consistency using both heuristics and expert annotator judgements. Most resources used to study language evolution computationally rely strictly on multilingual contemporary information, and lack information about prior stages of the languages. To provide such information, we augment the database with Latin paradigms from the LatInFlexi lexicon. Finally, to make it widely avalable, the resource is released under a GPLv3 license in CLDF format.
Anthology ID:
2020.lrec-1.370
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3027–3035
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.370
DOI:
Bibkey:
Cite (ACL):
Sacha Beniamine, Martin Maiden, and Erich Round. 2020. Opening the Romance Verbal Inflection Dataset 2.0: A CLDF lexicon. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3027–3035, Marseille, France. European Language Resources Association.
Cite (Informal):
Opening the Romance Verbal Inflection Dataset 2.0: A CLDF lexicon (Beniamine et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.370.pdf