Towards standardized inflected lexicons for the Finnic languages

Jules Bouton


Abstract
We introduce three richly annotated lexicons of nouns for Livonian, standard Finnish and Livvi Karelian. Our datasets are distributed in the machine-readable Paralex standard, which consists of linked CSV tables described in a JSON metadata file. We built on the morphological dictionary of Livonian, the VepKar database and the Omorfi software to provide inflected forms. All noun forms were transcribed with grapheme-to-phoneme conversion rules and the paradigms annotated for both overabundance and defectivity. The resulting datasets are usable for quantitative studies of morphological systems and for qualitative investigations. They are linked to the original resources and can be easily updated.
Anthology ID:
2024.iwclul-1.7
Volume:
Proceedings of the 9th International Workshop on Computational Linguistics for Uralic Languages
Month:
November
Year:
2024
Address:
Helsinki, Finland
Editors:
Mika Hämäläinen, Flammie Pirinen, Melany Macias, Mario Crespo Avila
Venue:
IWCLUL
SIG:
SIGUR
Publisher:
Association for Computational Linguistics
Note:
Pages:
59–66
Language:
URL:
https://aclanthology.org/2024.iwclul-1.7
DOI:
Bibkey:
Cite (ACL):
Jules Bouton. 2024. Towards standardized inflected lexicons for the Finnic languages. In Proceedings of the 9th International Workshop on Computational Linguistics for Uralic Languages, pages 59–66, Helsinki, Finland. Association for Computational Linguistics.
Cite (Informal):
Towards standardized inflected lexicons for the Finnic languages (Bouton, IWCLUL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.iwclul-1.7.pdf