StereoKG: Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes

Awantee Deshpande, Dana Ruiter, Marius Mosbach, Dietrich Klakow


Abstract
Analyzing ethnic or religious bias is important for improving fairness, accountability, and transparency of natural language processing models. However, many techniques rely on human-compiled lists of bias terms, which are expensive to create and are limited in coverage. In this study, we present a fully data-driven pipeline for generating a knowledge graph (KG) of cultural knowledge and stereotypes. Our resulting KG covers 5 religious groups and 5 nationalities and can easily be extended to more entities. Our human evaluation shows that the majority (59.2%) of non-singleton entries are coherent and complete stereotypes. We further show that performing intermediate masked language model training on the verbalized KG leads to a higher level of cultural awareness in the model and has the potential to increase classification performance on knowledge-crucial samples on a related task, i.e., hate speech detection.
Anthology ID:
2022.woah-1.7
Volume:
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)
Month:
July
Year:
2022
Address:
Seattle, Washington (Hybrid)
Venue:
WOAH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
67–78
Language:
URL:
https://aclanthology.org/2022.woah-1.7
DOI:
10.18653/v1/2022.woah-1.7
Bibkey:
Cite (ACL):
Awantee Deshpande, Dana Ruiter, Marius Mosbach, and Dietrich Klakow. 2022. StereoKG: Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 67–78, Seattle, Washington (Hybrid). Association for Computational Linguistics.
Cite (Informal):
StereoKG: Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes (Deshpande et al., WOAH 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.woah-1.7.pdf
Video:
 https://aclanthology.org/2022.woah-1.7.mp4
Code
 uds-lsv/stereokg
Data
CoLAConceptNetHate SpeechOLID