What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus

Cecilia Domingo, Tatiana Gonzalez-Ferrero, Itziar Gonzalez-Dios


Abstract
Natural Language Processing tools and resources have been so far mainly created and trained for standard varieties of language. Nowadays, with the use of large amounts of data gathered from social media, other varieties and registers need to be processed, which may present other challenges and difficulties. In this work, we focus on English and we present a preliminary analysis by comparing the TwitterAAE corpus, which is annotated for ethnicity, and WordNet by quantifying and explaining the online language that WordNet misses.
Anthology ID:
2021.gwc-1.27
Volume:
Proceedings of the 11th Global Wordnet Conference
Month:
January
Year:
2021
Address:
University of South Africa (UNISA)
Venues:
EACL | GWC
SIG:
Publisher:
Global Wordnet Association
Note:
Pages:
234–242
Language:
URL:
https://aclanthology.org/2021.gwc-1.27
DOI:
Bibkey:
Cite (ACL):
Cecilia Domingo, Tatiana Gonzalez-Ferrero, and Itziar Gonzalez-Dios. 2021. What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus. In Proceedings of the 11th Global Wordnet Conference, pages 234–242, University of South Africa (UNISA). Global Wordnet Association.
Cite (Informal):
What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus (Domingo et al., GWC 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.gwc-1.27.pdf