Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages

Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, David Yarowsky


Abstract
Exploiting the broad translation of the Bible into the world’s languages, we train and distribute morphosyntactic tools for approximately one thousand languages, vastly outstripping previous distributions of tools devoted to the processing of inflectional morphology. Evaluation of the tools on a subset of available inflectional dictionaries demonstrates strong initial models, supplemented and improved through ensembling and dictionary-based reranking. Likewise, a novel type-to-token based evaluation metric allows us to confirm that models generalize well across rare and common forms alike
Anthology ID:
2020.lrec-1.488
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3963–3972
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.488
DOI:
Bibkey:
Cite (ACL):
Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, and David Yarowsky. 2020. Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3963–3972, Marseille, France. European Language Resources Association.
Cite (Informal):
Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages (Nicolai et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.488.pdf