A Morphological Lexicon of Esperanto with Morpheme Frequencies

Eckhard Bick


Abstract
This paper discusses the internal structure of complex Esperanto words (CWs). Using a morphological analyzer, possible affixation and compounding is checked for over 50,000 Esperanto lexemes against a list of 17,000 root words. Morpheme boundaries in the resulting analyses were then checked manually, creating a CW dictionary of 28,000 words, representing 56.4% of the lexicon, or 19.4% of corpus tokens. The error percentage of the EspGram morphological analyzer for new corpus CWs was 4.3% for types and 6.4% for tokens, with a recall of almost 100%, and wrong/spurious boundaries being more common than missing ones. For pedagogical purposes a morpheme frequency dictionary was constructed for a 16 million word corpus, confirming the importance of agglutinative derivational morphemes in the Esperanto lexicon. Finally, as a means to reduce the morphological ambiguity of CWs, we provide POS likelihoods for Esperanto suffixes.
Anthology ID:
L16-1171
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1075–1078
Language:
URL:
https://aclanthology.org/L16-1171
DOI:
Bibkey:
Cite (ACL):
Eckhard Bick. 2016. A Morphological Lexicon of Esperanto with Morpheme Frequencies. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1075–1078, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Morphological Lexicon of Esperanto with Morpheme Frequencies (Bick, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1171.pdf