Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?

Hossein Azarpanah; Mohsen Farhadloo

doi:10.18653/v1/2021.trustnlp-1.2

Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?

Abstract

Word embeddings are widely used in Natural Language Processing (NLP) for a vast range of applications. However, it has been consistently proven that these embeddings reflect the same human biases that exist in the data used to train them. Most of the introduced bias indicators to reveal word embeddings’ bias are average-based indicators based on the cosine similarity measure. In this study, we examine the impacts of different similarity measures as well as other descriptive techniques than averaging in measuring the biases of contextual and non-contextual word embeddings. We show that the extent of revealed biases in word embeddings depends on the descriptive statistics and similarity measures used to measure the bias. We found that over the ten categories of word embedding association tests, Mahalanobis distance reveals the smallest bias, and Euclidean distance reveals the largest bias in word embeddings. In addition, the contextual models reveal less severe biases than the non-contextual word embedding models.

Anthology ID:: 2021.trustnlp-1.2
Volume:: Proceedings of the First Workshop on Trustworthy Natural Language Processing
Month:: June
Year:: 2021
Address:: Online
Editors:: Yada Pruksachatkun, Anil Ramakrishna, Kai-Wei Chang, Satyapriya Krishna, Jwala Dhamala, Tanaya Guha, Xiang Ren
Venue:: TrustNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8–14
Language:
URL:: https://aclanthology.org/2021.trustnlp-1.2
DOI:: 10.18653/v1/2021.trustnlp-1.2
Bibkey:
Cite (ACL):: Hossein Azarpanah and Mohsen Farhadloo. 2021. Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?. In Proceedings of the First Workshop on Trustworthy Natural Language Processing, pages 8–14, Online. Association for Computational Linguistics.
Cite (Informal):: Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use? (Azarpanah & Farhadloo, TrustNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.trustnlp-1.2.pdf

PDF Cite Search