Yorùbá Dependency Treebank (YTB)

Olájídé Ishola, Daniel Zeman


Abstract
Low-resource languages present enormous NLP opportunities as well as varying degrees of difficulties. The newly released treebank of hand-annotated parts of the Yoruba Bible provides an avenue for dependency analysis of the Yoruba language; the application of a new grammar formalism to the language. In this paper, we discuss our choice of Universal Dependencies, important dependency annotation decisions considered in the creation of the first annotation guidelines for Yoruba and results of our parsing experiments. We also lay the foundation for future incorporation of other domains with the initial test on Yoruba Wikipedia articles and highlighted future directions for the rapid expansion of the treebank.
Anthology ID:
2020.lrec-1.637
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5178–5186
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.637
DOI:
Bibkey:
Cite (ACL):
Olájídé Ishola and Daniel Zeman. 2020. Yorùbá Dependency Treebank (YTB). In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5178–5186, Marseille, France. European Language Resources Association.
Cite (Informal):
Yorùbá Dependency Treebank (YTB) (Ishola & Zeman, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.637.pdf