Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder

Dense retrievers encode queries and documents and map them in an embedding space using pre-trained language models. These embeddings need to be high-dimensional to fit training signals and guarantee the retrieval effectiveness of dense retrievers. However, these high-dimensional embeddings lead to larger index storage and higher retrieval latency. To reduce the embedding dimensions of dense retrieval, this paper proposes a Conditional Autoencoder (ConAE) to compress the high-dimensional embeddings to maintain the same embedding distribution and better recover the ranking features. Our experiments show that ConAE is effective in compressing embeddings by achieving comparable ranking performance with its teacher model and making the retrieval system more efficient. Our further analyses show that ConAE can alleviate the redundancy of the embeddings of dense retrieval with only one linear layer. All codes of this work are available at https://github.com/NEUIR/ConAE.


Introduction
As the first stage of numerous multi-stage IR and NLP tasks (Nogueira et al., 2019;Chen et al., 2017;Thorne et al., 2018), dense retrievers (Xiong et al., 2021a) have shown lots of advances in conducting semantic searching and avoiding the vocabulary mismatch problem (Robertson and Zaragoza, 2009).Dense retrievers usually encode queries and documents as high-dimensional embeddings, which are necessary to guarantee retrieval effectiveness during training (Ma et al., 2021;Reimers and Gurevych, 2021).Nevertheless, high dimensional embeddings usually exhaust the memory to store the index and lead to longer retrieval latency (Indyk and Motwani, 1998;Meiser, 1993).
The research of building efficient dense retrieval systems has been stimulated recently (Min et al., 2021).To reduce the dimensions of document embeddings, existing work reserves the principle dimensions or compresses query and document embeddings for building more efficient retrievers (Yang and Seo, 2021;Ma et al., 2021).
There are two challenges in compressing embeddings of dense retrievers: The compressed embeddings should share a similar distribution with the original embeddings, making the low-dimensional embedding space uniform and the document embeddings distinguishable; All the compressed embeddings should have the ability to maintain the maximal information for matching related queries and documents during retrieval, which helps better align the related query-document pairs.This paper proposes a Conditional Autoencoder (ConAE), which aims to build efficient dense retrieval systems by reducing the embedding dimensions of queries and documents.ConAE first encodes high-dimensional embeddings into a lowdimensional embedding space and then generates embeddings that can be aligned to related queries or documents in the original embedding space.In addition, ConAE designs a conditional loss to regulate the low-dimensional embedding space to mimic the embedding distribution of highdimensional embeddings.Our experiments show that ConAE is effective to compress the highdimensional embeddings and avoid redundant ranking features by achieving comparable retrieval performance with vanilla dense retrievers and better visualizing the embedding space with t-SNE.

Related Work
Dense retrievers use a bi-encoder architecture to encode queries and documents and map them in an embedding space for retrieval (Karpukhin et al., 2020;Xiong et al., 2021b,a;Lewis et al., 2020;Zhan et al., 2021;Li et al., 2021;Yu et al., 2021).
To learn an effective embedding space, dense retrievers are forced to maintain high-dimensional embeddings to fit training signals.
The most direct way to reduce the dimension of embeddings is that retaining parts of the dimensions of high-dimensional embeddings (Yang and Seo, 2021;Ma et al., 2021).Some work uses the first 128 dimensions to encode both queries and documents (Yang and Seo, 2021) or utilizes PCA to retain the primary dimensions to recover most information from the raw embeddings (Ma et al., 2021).Other work (Ma et al., 2021) proposes a supervised method, which uses neural networks to compress the high-dimensional embeddings as lowerdimensional ones.These supervised models provide a better dimension reduction way than unsupervised models by avoiding missing too much information.To optimize the encoders, some work (Ma et al., 2021) continuously trains dense retrievers with the contrastive training strategies (Karpukhin et al., 2020;Xiong et al., 2021a).
Then we can calculate the retrieval score f (q, d) of q and d with dot product f (h q , h d ) = h q • h d .Then we contrastively train query and document encoders by maximizing the retrieval probability P (d + |q, {d + } ∪ D − ) of the relevant document d + (Xiong et al., 2021b,a): where d − is the document sampled from the irrelevant document set D − (Karpukhin et al., 2020;Xiong et al., 2021a).

Dimension Compression with ConAE
In this subsection, we introduce ConAE to compress the K-dimensional embeddings h q and h d of both queries and documents to the L-dimensional embeddings h e q and h e d .Encoder.We first get the initial representations h q and h d for query q and document d from existing dense retrievers, such as ANCE (Xiong et al., 2021a).Then these K-dimensional embeddings can be compressed to low dimensional ones with two different linear layers, Linear q and Linear d : h e q and h e d are L-dimensional embeddings.The dimension L can be 256, 128 or 64, which is much lower than the dimension K of h q and h d .
Then we use KL divergence to regulate encoded embeddings to mimic the initial embedding distributions of queries and documents: where P e (d|q, D top ) is calculated with E.q. 1, using the encoded embeddings h e q and h e d .D top consists of the top-ranked documents, which are searched by the teacher retriever-ANCE.
Decoder.The decoder module maps the encoded embeddings h e q and h e d into the original embedding space by aligning the compressed embeddings h e q and h e d with h q and h d .It aims at optimizing encoder modules to maximally maintain ranking features from the initial representations h q and h d of query and document.
Firstly, we use one linear layer to project h e q and h e d to K-dimensional embeddings, ĥq and ĥd : ĥq = Linear(h e q ); ĥd = Linear(h e d ). (4) Then we respectively train the decoded embeddings ĥq and ĥd to align with h q and h d in the original embedding space using two max margin losses L q and L d .The max margin loss is widely used in previous neural IR research to optimize the ranking scores (Xiong et al., 2017;Dai et al., 2018).The first loss L q is used to optimize the decoded query representation ĥq : and we can also optimize the decoded document representation ĥd with the second loss function L d : Training Loss.Finally, we train our conditional autoencoder model with the following loss L: where λ is a hyper-parameter to weight the autoencoder losses.

Experimental Methodology
This section describes the datasets, evaluation metrics, baselines and implementation details of our experiments.
Dataset.Four datasets are used to evaluate the retrieval effectiveness of different dimension reduction models, including MS MARCO (Passage Ranking) (Nguyen et al., 2016), NQ (Kwiatkowski et al., 2019), TREC DL (Craswell et al., 2020) and TREC-COVID (Roberts et al., 2020).In our experiments, we randomly sample 50,000 queries from the raw training set of MS MARCO as the development set and use MS MARCO (Dev) as the testing set.The dimension reduction models that are trained on MS MARCO are also evaluated on two benchmarks, TREC DL and TREC-COVID, aiming to evaluate their generalization ability.All data statistics are shown in Table 1.
Evaluation Metrics.NDCG@10 is used as the evaluation metric on three benchmarks, MS MARCO, TREC DL and TREC-COVID.MS MARCO also uses MRR@10 as the primary evaluation metric (Nguyen et al., 2016).For the NQ dataset, the hit accuracy on Top20 and Top100 is used as the evaluation metric, which is the same as previous work (Karpukhin et al., 2020).
Baselines.In our experiments, we compare ConAE with two baselines from previous work (Ma et al., 2021), Principle Component Analysis (PCA) and CE.PCA reduces the embedding dimension by retaining the principle dimensions that can keep most of the variance within the original representation.CE model uses two linear layers W q and W d without biases to transform dense representations of queries and documents into lower embeddings (Ma et al., 2021).We also start from CE models and continuously train the whole model to implement our ANCE models to generate query and document embeddings of different dimensions.
Implementation Details.The rest describes our implementation details.All embedding dimension reduction models base on one of the best dense retrievers ANCE (Xiong et al., 2021a) and build document index with exact matching (flat index), which is implemented by FAISS (Johnson et al., 2019).During training ConAE, we set the hyperparameter λ as 0.1 and search Top100 documents using vanilla ANCE to construct the D top collection for each query.For our CE and ANCE models, we sample 7 negative documents for each query to contrastively train these models and sample 1 negative document to train ConAE.In our experiments, we set the batch size to 2 and accumulate step to 8 for ANCE.The batch size and accumulate step are 128 and 1 for other models.All models are implemented with PyTorch and tuned with Adam optimizer.The learning rates of ANCE and other models are set to 2e − 6 and 0.001, respectively.

Evaluation Result
Four experiments are conducted in this section to study the effectiveness of ConAE in reducing embedding dimensions for dense retrieval.

Overall Performance
The performance of different dimension reduction models is shown in Table 2. PCA, CE and ConAE are based on ANCE (Teacher), which freezes the teacher model and only optimizes the dimension projection layers.ANCE starts from CE and continuously tunes all parameters in the model.
Compared with PCA and CE (Ma et al., 2021), ConAE achieves the best performance on almost of datasets, which shows its effectiveness in compressing dense retrieval embeddings.ConAE can achieve comparable performance with ANCE (Teacher) using 128-dimensional embeddings to build the document index on MS MARCO, which reduces the retrieval latency (from 17.152 ms to 3.942 ms per query) and saves the index storage (from 26.0G to 4.3G) significantly.It demonstrates that ConAE is effective to alleviate the redundancy of the embeddings learned by dense retrievers.
Among all baselines, PCA shows significantly worse ranking performance on MS MARCO, indicating that embedding dimensions of dense retrievers are usually nonorthogonal.ConAE-128 achieves more than 11% improvements than Method MS MARCO NQ TREC DL TREC-COVID Latency(ms) MRR@10 NDCG@10 Rec@1000 Top20 Top100 NDCG@10 NDCG@10 Teacher-768      CE and performs much better on TREC-COVID, demonstrating its ranking effectiveness and generalization ability.ANCE can further improve the retrieval performance of CE by continuously training the query and document encoders, which adapts the teacher model to the low-dimensional version.

Ablation Study
This subsection conducts ablation studies in Table 3 to investigate the effectiveness of different modules in our ConAE model.(Qu et al., 2021).
ConAE combines both KL and autoencoder architectures to fully use training signals and regulate the distribution of compressed embedding, which usually achieves better retrieval performance.

Embedding Visualization with ConAE
We randomly sample one case from MS MARCO and visualize the embedding space of query and retrieved documents in Figure 1.
We first employ t-SNE (van der Maaten and Hinton, 2008) to visualize the embedding spaces of ANCE (Teacher) and ConAE.As shown in Figure 1(b), ConAE-128 conducts a more meaningful visualization results: the related query-document pair is closer and the other documents are distributed around the golden document according to their relevance to the query.The visualization of ANCE (Teacher) is slightly distorted and different from our expectations, which is mainly due to its redundancy.The redundant features usually mislead t-SNE to overfit these ranking features, thus reducing the embedding dimension of dense retrievers to 128 provides a possible way to alleviate redundant features and better visualize the embedding space of dense retrievers using t-SNE.Besides, ConAE-64 shows decreased retrieval performance than ConAE-128 (Sec.5.1).As shown in Figure 1(c), it mainly derives from that ConAE-64 loses some ranking features with the limited embedding dimensions.
The other way to visualize the embedding space is using ConAE w/o Decoder to project the embedding to a 2-dimensional coordinate.It uses KL divergence to optimize the 2-dimensional embeddings to mimic the relevance score distribution of teacher models.As shown in Figure 1(d), the distributions of documents are distinguishable, which provides an intuitive way to analyze the rankingoriented document distribution.In addition, the query is usually far away from the documents.The main reason lies that the relevance scores are calculated by dot product and the embedding norms are meaningful to distinguish the relevant documents.

Retrieval Performance with HNSW
Besides exact searching, we also show retrieval results of different dimension reduction methods in Table 4, which are implemented by the approximate nearest neighbor (ANN) search, Hierarchical Navigable Small World (HNSW).Using HNSW, the retrieval efficiency can be further improved, especially for high-dimensional embeddings.ConAE keeps its advanced retrieval performance again with less than 1ms retrieval latency.

Conclusion
This paper presents ConAE, which reduces the embedding dimension of dense retrievers.Our experiments show that ConAE can achieve comparable retrieval performance with the teacher model, significantly reduce the index storage and accelerate the searching process.Our further analyses show that the high-dimensional embeddings of dense retrievers are usually redundant and ConAE helps to alleviate such redundancy and visualize the embedding space more intuitively and effectively.

Limitations
In this paper, we mainly focus on compressing the embeddings of dense retrievers in an additional stage between query/document encoding and index building.As a result, we fix query and document embeddings of dense retrievers and project highdimensional embeddings to low-dimensional ones using only one linear layer.Thus, the effectiveness of ConAE is limited by the number of learnable parameters.Even though ConAE shows comparable performance with ANCE (Teacher), joint modeling the query/document encoder, dimension reduction module and index building still show strong potential to achieve better retrieval performance.

Figure 1 :
Figure 1: Embedding Visualization of Different Dense Retrievers. Figure 1(a), 1(b) and 1(c) are plotted with t-SNE with 768, 128 and 64 dimensional embeddings.In Figure 1(d), we directly use ConAE w/o Decoder to visualize the document embedding space of ANCE.The "•" in "dark orange" color denotes the golden document that ranked 2nd by ConAE-64 and 1st by other models.For other documents, darker blue ones are more relevant to the query.
The different modules in ConAE play different roles.Compared with ConAE w/o Decoder, ConAE w/o KL usually shows better retrieval effectiveness on the two benchmarks MS MARCO and TREC DL, which ask the model to retrieve candidates from the same data source.It demonstrates that our autoencoder architecture can reserve more ranking features to fit the training supervision of MS MARCO.On the other hand, ConAE w/o Decoder shows stronger generalization ability by outperforming ConAE w/o KL on TREC-COVID, which belongs to a different domain.The source of the generalization ability of ConAE w/o Decoder may come from finer-grained training signals from our teacher model.The annotated training signals usually face the hole rate problem (Xiong et al., 2020) and using neural IR models to denoise the training signals has shown strong effectiveness in training neural IR models

Table 2 :
Performance of Different Dimension Reduction Models.We start from ANCE (Teacher), reduce the embedding dimension and evaluate their retrieval effectiveness.The document indices are built with flat index and the sizes of MS MARCO indices are 26.0G,8.5G, 4.3G and 2.2G for 768, 256, 128 and 64 dimensional embeddings.

Table 3 :
Retrieval Performance of Different Ablation Models.ConAE w/o Decoder and ConAE w/o KL use L KL and L q + L d to train the distillation models.

Table 4 :
ANN Retrieval Effectiveness of Different Models.The ANN index is built with HNSW.