Compiling a List of Frequently Used Setswana Words for Developing Readability Measures

Johannes Sibeko


Abstract
This paper addresses the pressing need for improved readability assessment in Setswana through the creation of a list of frequently used words in Setswana. The end goal is to integrate this list into the adaptation of traditional readability measures in Setswana, such as the Dale-Chall index, which relies on frequently used words. Our initial list is developed using corpus-based methods utilising frequency lists obtained from five sets of corpora. It is then refined using manual methods. The analysis section delves into the challenges encountered during the development of the final list, encompassing issues like the inclusion of non-Setswana words, proper names, unexpected terms, and spelling variations. The decision-making process is clarified, highlighting crucial choices such as the retention of contemporary terms and the acceptance of diverse spelling variations. These decisions reflect a nuanced balance between linguistic authenticity and readability. This paper contributes to the discourse on text readability in indigenous Southern African languages. Moreover, it establishes a foundation for tailored literacy initiatives and serves as a starting point for adapting traditional frequency-list-based readability measures to Setswana.
Anthology ID:
2024.rail-1.5
Volume:
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Rooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Venues:
RAIL | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
37–44
Language:
URL:
https://aclanthology.org/2024.rail-1.5
DOI:
Bibkey:
Cite (ACL):
Johannes Sibeko. 2024. Compiling a List of Frequently Used Setswana Words for Developing Readability Measures. In Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024, pages 37–44, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Compiling a List of Frequently Used Setswana Words for Developing Readability Measures (Sibeko, RAIL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.rail-1.5.pdf