Towards Augmenting Lexical Resources for Slang and African American English

Alyssa Hwang, William R. Frey, Kathleen McKeown


Abstract
Researchers in natural language processing have developed large, robust resources for understanding formal Standard American English (SAE), but we lack similar resources for variations of English, such as slang and African American English (AAE). In this work, we use word embeddings and clustering algorithms to group semantically similar words in three datasets, two of which contain high incidence of slang and AAE. Since high-quality clusters would contain related words, we could also infer the meaning of an unfamiliar word based on the meanings of words clustered with it. After clustering, we compute precision and recall scores using WordNet and ConceptNet as gold standards and show that these scores are unimportant when the given resources do not fully represent slang and AAE. Amazon Mechanical Turk and expert evaluations show that clusters with low precision can still be considered high quality, and we propose the new Cluster Split Score as a metric for machine-generated clusters. These contributions emphasize the gap in natural language processing research for variations of English and motivate further work to close it.
Anthology ID:
2020.vardial-1.15
Volume:
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venues:
COLING | VarDial
SIG:
Publisher:
International Committee on Computational Linguistics (ICCL)
Note:
Pages:
160–172
Language:
URL:
https://aclanthology.org/2020.vardial-1.15
DOI:
Bibkey:
Cite (ACL):
Alyssa Hwang, William R. Frey, and Kathleen McKeown. 2020. Towards Augmenting Lexical Resources for Slang and African American English. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 160–172, Barcelona, Spain (Online). International Committee on Computational Linguistics (ICCL).
Cite (Informal):
Towards Augmenting Lexical Resources for Slang and African American English (Hwang et al., VarDial 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.vardial-1.15.pdf
Data
ConceptNet