A Corpus-Based List of Frequently Used Words in Sesotho

Johannes Sibeko, Orphée De Clercq


Abstract
This paper describes the SpeechReporting Corpus, an online collection of corpora annotated for a range of discourse phenomena. The corpora contain folktales from 7 lesser-studied West African languages. Apart from its value for theoretical linguistics, especially for the study of reported speech, the database is an important resource for the preservation of intangible cultural heritage of minority languages and the development and testing of cross-linguistically applicable computational tools.
Anthology ID:
2023.rail-1.5
Volume:
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Rooweither Mabuya, Don Mthobela, Mmasibidi Setaka, Menno Van Zaanen
Venue:
RAIL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–41
Language:
URL:
https://aclanthology.org/2023.rail-1.5
DOI:
10.18653/v1/2023.rail-1.5
Bibkey:
Cite (ACL):
Johannes Sibeko and Orphée De Clercq. 2023. A Corpus-Based List of Frequently Used Words in Sesotho. In Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023), pages 32–41, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
A Corpus-Based List of Frequently Used Words in Sesotho (Sibeko & De Clercq, RAIL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.rail-1.5.pdf
Video:
 https://aclanthology.org/2023.rail-1.5.mp4