Dictionary-based Debiasing of Pre-trained Word Embeddings

Word embeddings trained on large corpora have shown to encode high levels of unfair discriminatory gender, racial, religious and ethnic biases. In contrast, human-written dictionaries describe the meanings of words in a concise, objective and an unbiased manner. We propose a method for debiasing pre-trained word embeddings using dictionaries, without requiring access to the original training resources or any knowledge regarding the word embedding algorithms used. Unlike prior work, our proposed method does not require the types of biases to be pre-defined in the form of word lists, and learns the constraints that must be satisfied by unbiased word embeddings automatically from dictionary definitions of the words. Specifically, we learn an encoder to generate a debiased version of an input word embedding such that it (a) retains the semantics of the pre-trained word embedding, (b) agrees with the unbiased definition of the word according to the dictionary, and (c) remains orthogonal to the vector space spanned by any biased basis vectors in the pre-trained word embedding space. Experimental results on standard benchmark datasets show that the proposed method can accurately remove unfair biases encoded in pre-trained word embeddings, while preserving useful semantics.


Introduction
Despite pre-trained word embeddings are useful due to their low dimensionality, memory and compute efficiency, they have shown to encode not only the semantics of words but also unfair discriminatory biases such as gender, racial or religious biases (Bolukbasi et al., 2016;Zhao et al., 2018a;Rudinger et al., 2018;Zhao et al., 2018b; Elazar * Danushka Bollegala holds concurrent appointments as a Professor at University of Liverpool and as an Amazon Scholar. This paper describes work performed at the University of Liverpool and is not associated with Amazon. and Goldberg, 2018; Kaneko and Bollegala, 2019). On the other hand, human-written dictionaries act as an impartial, objective and unbiased source of word meaning. Although methods that learn word embeddings by purely using dictionaries have been proposed (Tissier et al., 2017), they have coverage and data sparseness related issues because precompiled dictionaries do not capture the meanings of neologisms or provide numerous contexts as in a corpus. Consequently, prior work has shown that word embeddings learnt from large text corpora to outperform those created from dictionaries in downstream NLP tasks (Alsuhaibani et al., 2019;Bollegala et al., 2016).
We must overcome several challenges when using dictionaries to debias pre-trained word embeddings. First, not all words in the embeddings will appear in the given dictionary. Dictionaries often have limited coverage and will not cover neologisms, orthographic variants of words etc. that are likely to appear in large corpora. A lexicalised debiasing method would generalise poorly to the words not in the dictionary. Second, it is not known apriori what biases are hidden inside a set of pretrained word embedding vectors. Depending on the source of documents used for training the embeddings, different types of biases will be learnt and amplified by different word embedding learning algorithms to different degrees (Zhao et al., 2017).
Prior work on debiasing required that the biases to be pre-defined (Kaneko and Bollegala, 2019). For example, Hard-Debias (HD;Bolukbasi et al., 2016) and Gender Neutral Glove (GN-GloVe; Zhao et al., 2018b) require lists of male and female pronouns for defining the gender direction. However, gender bias is only one of the many biases that exist in pre-trained word embeddings. It is inconvenient to prepare lists of words covering all different types of biases we must remove from pretrained word embeddings. Moreover, such pre-compiled word lists are likely to be incomplete and inadequately cover some biases. Indeed, Gonen and Goldberg (2019) showed empirical evidence that such debiasing methods do not remove all discriminative biases from word embeddings. Unfair biases have adversely affected several NLP tasks such as machine translation (Vanmassenhove et al., 2018) and language generation (Sheng et al., 2019). Racial biases have also been shown to affect criminal prosecutions (Manzini et al., 2019) and career adverts (Lambrecht and Tucker, 2016). These findings show the difficulty of defining different biases using pre-compiled lists of words, which is a requirement in previously proposed debiasing methods for static word embeddings.
We propose a method that uses a dictionary as a source of bias-free definitions of words for debiasing pre-trained word embeddings 1 . Specifically, we learn an encoder that filters-out biases from the input embeddings. The debiased embeddings are required to simultaneously satisfy three criteria: (a) must preserve all non-discriminatory information in the pre-trained embeddings (semantic preservation), (b) must be similar to the dictionary definition of the words (dictionary agreement), and (c) must be orthogonal to the subspace spanned by the basis vectors in the pre-trained word embedding space that corresponds to discriminatory biases (bias orthogonality). We implement the semantic preservation and dictionary agreement using two decoders, whereas the bias orthogonality is enforced by a parameter-free projection. The debiasing encoder and the decoders are learnt endto-end by a joint optimisation method. Our proposed method is agnostic to the details of the algorithms used to learn the input word embeddings. Moreover, unlike counterfactual data augmentation methods for debiasing (Zmigrod et al., 2019;Hall Maudslay et al., 2019), we do not require access to the original training resources used for learning the input word embeddings.
Our proposed method overcomes the abovedescribed challenges as follows. First, instead of learning a lexicalised debiasing model, we operate on the word embedding space when learning the encoder. Therefore, we can use the words that are in the intersection of the vocabularies of the pre-trained word embeddings and the dictionary to learn the encoder, enabling us to generalise to the words not in the dictionary. Second, we do not require pre-compiled word lists specifying the biases. The dictionary acts as a clean, unbiased source of word meaning that can be considered as positive examples of debiased meanings. In contrast to the existing debiasing methods that require us to predefine what to remove, the proposed method can be seen as using the dictionary as a guideline for what to retain during debiasing.
We evaluate the proposed method using four standard benchmark datasets for evaluating the biases in word embeddings: Word Embedding Association Test (WEAT; Caliskan et al., 2017), Word Association Test (WAT; Du et al., 2019), Sem-Bias (Zhao et al., 2018b) and WinoBias (Zhao et al., 2018a). Our experimental results show that the proposed debiasing method accurately removes unfair biases from three widely used pre-trained embeddings: Word2Vec (Mikolov et al., 2013b), GloVe (Pennington et al., 2014) and fastText (Bojanowski et al., 2017). Moreover, our evaluations on semantic similarity and word analogy benchmarks show that the proposed debiasing method preserves useful semantic information in word embeddings, while removing unfair biases.

Related Work
Dictionaries have been popularly used for learning word embeddings Hirst, 2006, 2001;Jiang and Conrath, 1997). Methods that use both dictionaries (or lexicons) and corpora to jointly learn word embeddings (Tissier et al., 2017;Alsuhaibani et al., 2019;Bollegala et al., 2016) or post-process (Glavaš and Vulić, 2018;Faruqui et al., 2015) have also been proposed. However, learning embeddings from dictionaries alone results in coverage and data sparseness issues (Bollegala et al., 2016) and does not guarantee bias-free embeddings (Lauscher and Glavas, 2019). To the best of our knowledge, we are the first to use dictionaries for debiasing pre-trained word embeddings. Bolukbasi et al. (2016) proposed a postprocessing approach that projects gender-neutral words into a subspace, which is orthogonal to the gender dimension defined by a list of genderdefinitional words. They refer to words associated with gender (e.g., she, actor) as gender-definitional words, and the remainder gender-neutral. They proposed a hard-debiasing method where the gender direction is computed as the vector difference between the embeddings of the correspond-ing gender-definitional words, and a soft-debiasing method, which balances the objective of preserving the inner-products between the original word embeddings, while projecting the word embeddings into a subspace orthogonal to the gender definitional words. Both hard and soft debiasing methods ignore gender-definitional words during the subsequent debiasing process, and focus only on words that are not predicted as gender-definitional by the classifier. Therefore, if the classifier erroneously predicts a stereotypical word as a genderdefinitional word, it would not get debiased. Zhao et al. (2018b) modified the GloVe (Pennington et al., 2014) objective to learn gender-neutral word embeddings (GN-GloVe) from a given corpus. They maximise the squared 2 distance between gender-related sub-vectors, while simultaneously minimising the GloVe objective. Unlike, the above-mentioned methods, Kaneko and Bollegala (2019) proposed a post-processing method to preserve gender-related information with autoencoder (Kaneko and Bollegala, 2020), while removing discriminatory biases from stereotypical cases (GP-GloVe). However, all prior debiasing methods require us to pre-define the biases in the form of explicit word lists containing gender and stereotypical word associations. In contrast we use dictionaries as a source of bias-free semantic definitions of words and do not require pre-defining the biases to be removed. Although we focus on static word embeddings in this paper, unfair biases have been found in contextualised word embeddings as well (Zhao et al., 2019;Vig, 2019;Bordia and Bowman, 2019;May et al., 2019).
Adversarial learning methods (Xie et al., 2017;Elazar and Goldberg, 2018;Li et al., 2018) for debiasing first encode the inputs and then two classifiers are jointly trained -one predicting the target task (for which we must ensure high prediction accuracy) and the other protected attributes (that must not be easily predictable). However, Elazar and Goldberg (2018) showed that although it is possible to obtain chance-level development-set accuracy for the protected attributes during training, a post-hoc classifier trained on the encoded inputs can still manage to reach substantially high accuracies for the protected attributes. They conclude that adversarial learning alone does not guarantee invariant representations for the protected attributes. Ravfogel et al. (2020) found that iteratively projecting word embeddings to the null space of the gender direction to further improve the debiasing performance.
To evaluate biases, Caliskan et al. (2017) proposed the Word Embedding Association Test (WEAT) inspired by the Implicit Association Test (IAT; Greenwald et al., 1998). Ethayarajh et al. (2019) showed that WEAT to be systematically overestimating biases and proposed a correction. The ability to correctly answer gender-related word analogies (Zhao et al., 2018b) and resolve genderrelated coreferences (Zhao et al., 2018a;Rudinger et al., 2018) have been used as extrinsic tasks for evaluating the bias in word embeddings. We describe these evaluation benchmarks later in § 4.3.

Dictionary-based Debiasing
Let us denote the n-dimensional pre-trained word embedding of a word w by w ∈ R n trained on some resource C such as a text corpus. Moreover, let us assume that we are given a dictionary D containing the definition, s(w) of w. If the pre-trained embeddings distinguish among the different senses of w, then we can use the gloss for the corresponding sense of w in the dictionary as s(w). However, the majority of word embedding learning methods do not produce sense-specific word embeddings. In this case, we can either use all glosses for w in D by concatenating or select the gloss for the dominant (most frequent) sense of w 2 . Without any loss of generality, in the remainder of this paper, we will use s(w) to collectively denote a gloss selected by any one of the above-mentioned criteria with or without considering the word senses (in § 5.3, we evaluate the effect of using all vs. dominant gloss).
Next, we define the objective functions optimised by the proposed method for the purpose of learning unbiased word embeddings. Given, w, we model the debiasing process as the task of learning an encoder, E(w; θ e ) that returns an m(≤ n)dimensional debiased version of w. In the case where we would like to preserve the dimensionality of the input embeddings, we can set m = n, or m < n to further compress the debiased embeddings.
Because the pre-trained embeddings encode rich semantic information from a large text corpora, often far exceeding the meanings covered in the dictionary, we must preserve this semantic information as much as possible during the debiasing process. We refer to this constraint as semantic preservation. Semantic preservation is likely to lead to good performance in downstream NLP applications that use pre-trained word embeddings. For this purpose, we decode the encoded version of w using a decoder, D c , parametrised by θ c and define J c to be the reconstruction loss given by (1).
(1) Following our assumption that the dictionary definition, s(w), of w is a concise and unbiased description of the meaning of w, we would like to ensure that the encoded version of w is similar to s(w). We refer to this constraint as dictionary agreement. To formalise dictionary agreement empirically, we first represent s(w) by a sentence embedding vector s(w) ∈ R n . Different sentence embedding methods can be used for this purpose such as convolutional neural networks (Kim, 2014), recurrent neural networks (Peters et al., 2018) or transformers (Devlin et al., 2019). For the simplicity, we use the smoothed inverse frequency (SIF; Arora et al., 2017) for creating s(w) in this paper. SIF computes the embedding of a sentence as the weighted average of the pre-trained word embeddings of the words in the sentence, where the weights are computed as the inverse unigram probability. Next, the first principal component vector of the sentence embeddings are removed. The dimensionality of the sentence embeddings created using SIF is equal to that of the pre-trained word embeddings used. Therefore, in our case we have both w, s(w) ∈ R n .
We decode the debiased embedding E(w; θ e ) of w using a decoder D d , parametrised by θ d and compute the squared 2 distance between it and s(w) to define an objective J d given by (2).
Recalling that our goal is to remove unfair biases from pre-trained word embeddings and we assume dictionary definitions to be free of such biases, we define an objective function that explicitly models this requirement. We refer to this requirement as the bias orthogonality of the debiased embeddings. For this purpose, we first project the pre-trained word embedding w of a word w into a subspace that is orthogonal to the dictionary definition vector s(w). Let us denote this projection by φ(w, s(w)) ∈ R n . We require that the debiased word embedding, E(w; θ e ), must be orthogonal to φ(w, s(w)), and formalise this as the minimisation of the squared inner-product given in (3).
Note that because φ(w, s(w)) lives in the space spanned by the original (prior to encoding) vector space, we must first encode it using E before considering the orthogonality requirement.
To derive φ(w, s(w)), let us assume the ndimensional basis vectors in the R n vector space spanned by the pre-trained word embeddings to be b 1 , b 2 , . . . , b n . Moreover, without loss of generality, let the subspace spanned by the subset of the first k(< n) basis vectors b 1 , b 2 , . . . , b k to be B ⊆ R n . The projection v B of a vector v ∈ R n onto B can be expressed using the basis vectors as in (4).
using the basis vectors as given in (5).
We see that there are no basis vectors in common between the summations in (4) and (5).
Considering that s(w) defines a direction that does not contain any unfair biases, we can compute the vector rejection of w on s(w) following this result. Specifically, we subtract the projection of w along the unit vector defining the direction of s(w) to compute φ as in (6).
We consider the linearly-weighted sum of the above-defined three objective functions as the total objective function as given in (7).
As the dictionary definitions, we used the glosses in the WordNet (Fellbaum, 1998), which has been popularly used to learn word embeddings in prior work (Tissier et al., 2017;Bosc and Vincent, 2018;Washio et al., 2019). However, we note that our proposed method does not depend on any WordNetspecific features, thus in principle can be applied to any dictionary containing definition sentences. Words that do not appear in the vocabulary of the pre-trained embeddings are ignored when computing s(w) for the headwords w in the dictionary. Therefore, if all the words in a dictionary definition are ignored, then the we remove the corresponding headwords from training. Consequently, we are left with 54,528, 64,779 and 58,015 words respectively for Word2Vec, GloVe and fastText embeddings in the training dataset. We randomly sampled 1,000 words from this dataset and held-out as a development set for the purpose of tuning various hyperparameters in the proposed method.
E, D c and D d are implemented as single-layer feed forward neural networks with a hyperbolic tangent activation at the outputs. It is known that pre-training is effective when using autoencoders E and D c for debiasing (Kaneko and Bollegala, 2019). Therefore, we randomly select 5000 words from each pre-trained word embedding set and pretrain the autoencoders on those words with a minibatch of size 512. In pre-training, the model with the lowest loss according to (1) in the development set for pre-traininng is selected.

Hyperparameters
During optimisation, we used dropout (Srivastava et al., 2014) with probability 0.05 to w and E(w). We used Adam (Kingma and Ba, 2015) with initial learning rate set to 0.0002 as the optimiser to find the parameters θ e , θ c , and θ d and a mini-batch size of 4. The optimal values of all hyperparameters are found by minimising the total loss over the development dataset following a Monte-Carlo search. We found these optimal hyperparameter values of α = 0.99998, β = 0.00001 and γ = 0.00001. Note that the scale of different losses are different and the absolute values of hyperparameters do not indicate the significance of a component loss. For example, if we rescale all losses to the same range then we have L c = 0.005α, L d = 0.269β and L a = 21.1999γ. Therefore, debiasing (L d ) and orthogonalisation (L a ) contributions are significant.
We utilized a GeForce GTX 1080 Ti. The debiasing is completed in less than an hour because our method is only a fine-tuning technique. The parameter size of our debiasing model is 270,900.

Evaluation Datasets
We use the following datasets to evaluate the degree of the biases in word embeddings.

WEAT: Word
Embedding Association Test (WEAT; Caliskan et al., 2017), quantifies various biases (e.g. gender, race and age) using semantic similarities between word embeddings. It compares two same size sets of target words X and Y (e.g. European and African names), with two sets of attribute words A and B (e.g. pleasant vs. unpleasant). The bias score, s(X , Y, A, B), for each target is calculated as follows: Here, f is the cosine similarity between the word embeddings. The one-sided p-value for the permutation test regarding X and Y is calculated as the probability of s( The effect size is calculated as the normalised measure given by (10).
WAT: Word Association Test (WAT) is a method to measure gender bias over a large set of words (Du et al., 2019). It calculates the gender information vector for each word in a word association graph created with Small World of Words project (SWOWEN; Deyne et al., 2019) by propagating information related to masculine and feminine words (w i m , w i f ) ∈ L using a random walk approach (Zhou et al., 2003). The gender information is represented as a 2-dimensional vector (b m , b f ), where b m and b f denote respectively the masculine and feminine orientations of a word. The gender information vectors of masculine words, feminine words and other words are initialised respectively with vectors (1, 0), (0, 1) and (0, 0). The bias score of a word is defined as log(b m /b f ). We evaluate the gender bias of word embeddings using the Pearson correlation coefficient between the bias score of each word and the score given by (11) computed as the averaged difference of cosine similarities between masculine and feminine words.
SemBias: SemBias dataset (Zhao et al., 2018b) contains three types of word-pairs: (a) Definition, a gender-definition word pair (e.g. hero -heroine), (b) Stereotype, a gender-stereotype word pair (e.g., manager -secretary) and (c) None, two other word-pairs with similar meanings unrelated to gender (e.g., jazz -blues, pencil -pen). We use the cosine similarity between the # » he − # » she gender directional vector and a − b in above word pair (a, b) lists to measure gender bias. Zhao et al. (2018b) used a subset of 40 instances associated with 2 seed word-pairs, not used in the training split, to evaluate the generalisability of a debiasing method. For unbiased word embeddings, we expect high similarity scores in Definition category and low similarity scores in Stereotype and None categories.
WinoBias/OntoNotes: We use the Wino-Bias dataset (Zhao et al., 2018a) and OntoNotes (Weischedel et al., 2013) for coreference resolution to evaluate the effectiveness of our proposed debiasing method in a downstream task. WinoBias contains two types of sentences that require linking gendered pronouns to either male or female stereotypical occupations. In Type 1, co-reference decisions must be made using world knowledge about some given circumstances.
However, in Type 2, these tests can be resolved using syntactic information and understanding of the pronoun. It involves two conditions: the pro-stereotyped (pro) condition links pronouns to occupations dominated by the gender of the pronoun, and the anti-stereotyped (anti) condition links pronouns to occupations not dominated by the gender of the pronoun. For a correctly debiased set of word embeddings, the difference between pro and anti is expected to be small. We use the model proposed by Lee et al. (2017) and implemented in AllenNLP (Gardner et al., 2017) as the coreference resolution method.
We used a bias comparing code 6 to evaluate WEAT dataset. Since the WAT code was not published, we contacted the authors to obtain the code and used it for evaluation. We used the evaluation code from GP-GloVe 7 to evaluate SemBias dataset. We used AllenNLP 8 to evaluate WinoBias and OntoNotes datasets. We used evaluate word pairs function and evaluate word analogies in gensim 9 to evaluate word embedding benchmarks.

Overall Results
We initialise the word embeddings of the model by original (Org) and debiased (Deb) word embeddings and compare the coreference resolution accuracy using F1 as the evaluation measure.
In Table 1, we show the WEAT bias effects for cosine similarity and correlation on WAT dataset using the Pearson correlation coefficient. We see that the proposed method can significantly debias for various biases in all word embeddings in both WEAT and WAT. Especially in Word2Vec and fast-Text, almost all biases are debiased. Table 2 shows the percentages where a wordpair is correctly classified as Definition, Stereotype or None. We see that our proposed method succesfully debiases word embeddings based on results on Definition and Stereotype in SemBias. In addition, we see that the SemBias-subset can be debiased for Word2Vec and fastText. Table 3 shows the performance on WinoBias for Type 1 and Type 2 in pro and anti stereotypical 6 https://github.com/hljames/ compare-embedding-bias 7 https://github.com/kanekomasahiro/gp_ debias 8 https://github.com/allenai/allennlp 9 https://github.com/RaRe-Technologies/ gensim    conditions. In most settings, the diff is smaller for the debiased than the original word embeddings, which demonstrates the effectiveness of our proposed method. From the results for Avg, we see that debiasing is achieved with almost no loss in performance. In addition, the debiased scores on the OntoNotes are higher than the original scores for all word embeddings.

Comparison with Existing Methods
We compare the proposed method against the existing debiasing methods (Bolukbasi et al., 2016;Zhao et al., 2018b;Kaneko and Bollegala, 2019) mentioned in § 2 on WEAT, which contains different types of biases. We debias Glove 10 , which is used in Zhao et al. (2018b). All word embeddings used in these experiments are the pre-trained word embeddings used in the existing debiasing methods. Words in evaluation sets T3, T4 and T8 are not covered by the input pre-trained embeddings and hence not considered in this evaluation. From Table 4 we see that only the proposed method debiases all biases accurately. T5 and T6 are the tests for gender bias; despite prior debiasing methods do well in those tasks, they are not able to address other types of biases. Notably, we see that the proposed method can debias more accurately compared to previous methods that use word lists for gender debiasing, such as Bolukbasi et al. (2016) in T5 and Zhao et al. (2018b) in T6.

Dominant Gloss vs All Glosses
In   sense of the word) when creating s(w) on Sem-Bias benchmark as opposed to using all glosses (same as in Table 2). We see that debiasing using all glosses is more effective than using only the dominant gloss.

Word Embedding Benchmarks
It is important that a debiasing method removes only discriminatory biases and preserves semantic information in the original word embeddings. If the debiasing method removes more information than necessary from the original word embeddings, performance will drop when those debiased embeddings are used in NLP applications. Therefore, to evaluate the semantic information preserved after debiasing, we use semantic similarity and word analogy benchmarks as described next.
Word Analogy: In word analogy, we predict d that completes the proportional analogy "a is to b as c is to what?", for four words a, b, c and d. We use CosAdd (Levy and Goldberg, 2014), which determines d by maximising the cosine similarity between the two vectors (b − a + c) and d. Following Zhao et al. (2018b), we evaluate on MSR (Mikolov et al., 2013c) and Google analogy datasets (Mikolov et al., 2013a) as shown in Table 6. From Table 6 we see that for all word embeddings, debiased using the proposed method accurately preserves the semantic information in the original embeddings. In fact, except for Word2Vec embeddings on WS dataset, we see that the accuracy of the embeddings have improved after the debiasing process, which is a desirable side-effect. We believe this is due to the fact that the information in the dictionary definitions is used during the debiasing process. Overall, our proposed method removes unfair biases, while retaining (and sometimes further improving) the semantic information contained in the original word embeddings.
We also see that for GloVe embeddings the performance has improved after debiasing whereas for Word2Vec and fastText embeddings the opposite is true. Similar drop in performance in word analogy tasks have been reported in prior work (Zhao et al., 2018b). Besides CosAdd there are multiple alternative methods proposed for solving analogies using pre-trained word embeddings such as CosMult, PairDiff and supervised operators (Bollegala et al., 2015(Bollegala et al., , 2014Hakami et al., 2018). Moreover, there have been concerns raised about the protocols used in prior work evaluating word embeddings on word analogy tasks and the correlation with downstream tasks (Schluter, 2018). Therefore, we defer further investigation in this behaviour to future work.

Visualising the Outcome of Debiasing
We analyse the effect of debiasing by calculating the cosine similarity between neutral occupational words and gender ( The neutral occupational words list is based on Bolukbasi et al. (2016) and is listed in the Supplementary. Figure 1 shows the visualisation result for Word2Vec. We see that original Word2Vec shows some gender words are especially away from the origin (0.0). Moreover, age-related words have an overall bias towards "elder". Our debiased Word2Vec gathers vectors around the origin compared to the original Word2Vec for all gender, race and age vectors.
On the other hand, there are multiple words with high cosine similarity with the female gender after debiasing. We speculate that in rare cases their definition sentences contain biases. For example, in the WordNet the definitions for "homemaker" and "nurse" include gender-oriented words such as "a wife who manages a household while her husband earns the family income" and "a woman who is the custodian of children." It remains an interesting future challenge to remove biases from dictionaries when using for debiasing. Therefore, it is necessary to pay attention to biases included in the definition sentences when performing debiasing using dictionaries. Combining definitions from multiple dictionaries could potentially help to mitigate biases coming from a single dictionary. Another future research direction is to evaluate the proposed method for languages other than English using multilingual dictionaries.

Conclusion
We proposed a method to remove biases from pretrained word embeddings using dictionaries, without requiring pre-defined word lists. The experimental results on a series of benchmark datasets show that the proposed method can remove unfair biases, while retaining useful semantic information encoded in pre-trained word embeddings.