The First Universal Dependency Treebank for Tswana: Tswana-Popapolelo

Tanja Gaustad, Ansu Berg, Rigardt Pretorius, Roald Eiselen


Abstract
This paper presents the first publicly available UD treebank for Tswana, Tswana-Popapolelo. The data used consists of the 20 Cairo CICLing sentences translated to Tswana. After pre-processing these sentences with detailed POS (XPOS) and converting them to universal POS (UPOS), we proceeded to annotate the data with dependency relations, documenting decisions for the language specific constructions. Linguistic issues encountered are described in detail as this is the first application of the UD framework to produce a dependency treebank for the Bantu language family in general and for Tswana specifically.
Anthology ID:
2024.rail-1.7
Volume:
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Rooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Venues:
RAIL | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
55–65
Language:
URL:
https://aclanthology.org/2024.rail-1.7
DOI:
Bibkey:
Cite (ACL):
Tanja Gaustad, Ansu Berg, Rigardt Pretorius, and Roald Eiselen. 2024. The First Universal Dependency Treebank for Tswana: Tswana-Popapolelo. In Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024, pages 55–65, Torino, Italia. ELRA and ICCL.
Cite (Informal):
The First Universal Dependency Treebank for Tswana: Tswana-Popapolelo (Gaustad et al., RAIL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.rail-1.7.pdf