%0 Conference Proceedings %T AfroLID: A Neural Language Identification Tool for African Languages %A Adebara, Ife %A Elmadany, AbdelRahim %A Abdul-Mageed, Muhammad %A Inciarte, Alcides %Y Goldberg, Yoav %Y Kozareva, Zornitsa %Y Zhang, Yue %S Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing %D 2022 %8 December %I Association for Computational Linguistics %C Abu Dhabi, United Arab Emirates %F adebara-etal-2022-afrolid %X Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world’s 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural LID toolkit for 517 African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. When evaluated on our blind Test set, AfroLID achieves 95.89 F_1-score. We also compare AfroLID to five existing LID tools that each cover a small number of African languages, finding it to outperform them on most languages. We further show the utility of AfroLID in the wild by testing it on the acutely under-served Twitter domain. Finally, we offer a number of controlled case studies and perform a linguistically-motivated error analysis that allow us to both showcase AfroLID’s powerful capabilities and limitations %R 10.18653/v1/2022.emnlp-main.128 %U https://aclanthology.org/2022.emnlp-main.128 %U https://doi.org/10.18653/v1/2022.emnlp-main.128 %P 1958-1981