Rethinking Negative Pairs in Code Search

Recently, contrastive learning has become a key component in fine-tuning code search models for software development efficiency and effectiveness. It pulls together positive code snippets while pushing negative samples away given search queries. Among contrastive learning, InfoNCE is the most widely used loss function due to its better performance. However, the following problems in negative samples of InfoNCE may deteriorate its representation learning: 1) The existence of false negative samples in large code corpora due to duplications. 2). The failure to explicitly differentiate between the potential relevance of negative samples. As an example, a bubble sorting algorithm example is less ``negative'' than a file saving function for the quick sorting algorithm query. In this paper, we tackle the above problems by proposing a simple yet effective Soft-InfoNCE loss that inserts weight terms into InfoNCE. In our proposed loss function, we apply three methods to estimate the weights of negative pairs and show that the vanilla InfoNCE loss is a special case of Soft-InfoNCE. Theoretically, we analyze the effects of Soft-InfoNCE on controlling the distribution of learnt code representations and on deducing a more precise mutual information estimation. We furthermore discuss the superiority of proposed loss functions with other design alternatives. Extensive experiments demonstrate the effectiveness of Soft-InfoNCE and weights estimation methods under state-of-the-art code search models on a large-scale public dataset consisting of six programming languages. Source code is available at \url{https://github.com/Alex-HaochenLi/Soft-InfoNCE}.


Introduction
Code search is a common activity in software development that can boost the productivity of software developers (Nie et al., 2016;Shuai et al., 2020).Code search models can retrieve code fragments relevant to a given query from code bases (Grazia

Query
Posi ve Code Nega ve Code and Pradel, 2022).To train or fine-tune code search models, contrastive learning has become a key component in learning discriminative representations of queries and codes as it pushes apart negative query-code pairs and pulls together positive pairs (Shi et al., 2022a;Li et al., 2022a).InfoNCE (Van den Oord et al., 2018) is a representative choice of contrastive learning loss that considers the other in-batch samples as negative pairs given a query (Huang et al., 2021). 1lthough InfoNCE is effective in code search, we argue it is sub-optimal at discriminating code samples as it suffers from the following problems.First, the existence of false negatives in code bases.Lopes et al. (2017); Allamanis (2019) finds that code duplications are common in large code corpus, which means that many negative pairs are in fact false negatives.Training with false negative pairs may deteriorate code representation learning.Second, the setting of InfoNCE ignores the potential relevance of negative codes (Li et al., 2022c).For example, for a given query asking about quick sort-ing algorithms, among negative codes the bubble sorting algorithm is expected to be retrieved before file-saving functions, but the current training procedure cannot model this relationship explicitly.The InfoNCE loss just treats all negative codes equally, as shown in Fig. 1.In fact, the false negative cancellation is a special case of modeling potential relevance, where the former only considers potential relevance as binary while the latter focus on describing it in a continuous field.Although some methods (Huynh et al., 2022;Chen et al., 2022;Li et al., 2022c) are proposed to solve these two problems, they are all applied during model pre-training.In model fine-tuning stage, they still use InfoNCE loss.As a result, our aforementioned two problems are still largely unexplored in model fine-tuning stage.
In this work, we first revisit the commonly used InfoNCE loss and explain why it cannot model potential relations among codes explicitly.Then we present Soft-InfoNCE to handle this problem, by simply inserting a weight term into the denominator of InfoNCE loss.We also propose three methods to estimate the weight terms and compare them empirically.To justify the effect of Soft-InfoNCE loss, we theoretically analyze its properties with regard to representation distribution and mutual information estimation.Our analysis indicates that the inserted weight encourages negative pairs to approximate a given distribution and reduces bias in the estimation of mutual information by leveraging importance sampling.Moreover, we prove that the proposed Soft-InfoNCE loss upper bounds other loss function designs like Binary Cross Entropy and weighted InfoNCE loss.We also relate existing false negative cancellation methods and our methods.Finally, we demonstrate the effectiveness of the proposed Soft-InfoNCE loss by evaluating it on several pre-training models across six large-scale datasets.Additional ablation studies also validate our theoretical analysis empirically.
In summary, our contributions of this work are as follows: • We propose a novel contrastive loss, Soft-InfoNCE, that models the potential relations among negative pairs explicitly by simply inserting a weight term to the vanilla InfoNCE loss.
• We conduct theoretical analysis to show that Soft-InfoNCE loss can control the distribution of learnt representations and reduce the bias in mutual information estimation.We also prove the superiority of Soft-InfoNCE loss over other choices of design and reach a conclusion that previous false negative cancellation works can be considered as a special case of our proposed methods by discussing the relation.
• We apply Soft-InfoNCE loss on several code search models and evaluate them on the public CodeSearchnet dataset with six programming languages.Extensive experiment results verify the validity of our theoretical analysis and the effectiveness of our method.

Preliminaries
Code search aims at retrieving the most relevant code fragments for a given query.During training, we take the comment of code as a query and maximize the similarities between the query and its associated code.Meanwhile, we minimize the similarities between negative pairs that are generated by In-Batch Negative (Huang et al., 2021) , where x i is a code fragment and y i is its corresponding query, K is the size of the dataset, a Siamese encoder g : C ∪ Q → H is used to map codes and queries to a shared representation space H. Thus, we obtain two representation sets, H c = {g(x i )} K i=1 and H q = {g(y i )} K i=1 .We calculate the similarities between query-code pairs by dot product or cosine distance.And we optimize the distribution of representations by contrastive learning.Several loss functions are proposed for this objective (Mikolov et al., 2013;Weinberger and Saul, 2009;Hadsell et al., 2006).Among them, InfoNCE loss ( Van den Oord et al., 2018) is dominantly adopted by recent code search models due to its better performance than others.We denote q i ∈ H q and c i ∈ H c as representations of queries and codes, respectively.For a given batch of data, we could generate 1 positive pair and N − 1 negative pairs for each query, where N is the batch size.The InfoNCE loss can be described as: where (q i , c j ) are positive pairs when i = j and negative pairs otherwise.Here we adopt the dot product as the measurement of similarity.

Revisiting InfoNCE Loss
We reformulate Eq.(1) to: (2) As discovered by (Wang and Isola, 2020), the two terms correspond to two objectives of contrastive learning.The first term can be expressed as the alignment of positive pairs that pulls positive instances together.The second term enforces uniform distribution of negative pairs because it pushes all negative pairs apart.
However, we argue that negative pairs should not be distributed uniformly.In other words, unlabeled data may also share some similarities with the given query.Suppose we have a query "How to implement a bubble sorting algorithm in Python?", although both a quick sorting algorithm and a file saving function are considered to be negative results, quick sorting is expected to be retrieved before file saving since it is more relevant to the query.Moreover, Lopes et al. (2017); Allamanis (2019) find that code duplication is common in code corpora which means that there are many false negative examples during training.In conclusion, negative samples in a batch should not be treated equally, as illustrated in Fig. 1.

Soft-InfoNCE Loss
To address the aforementioned problems, we propose Soft-InfoNCE loss by simply inserting a weight term w ij into the original format, which can be described as: where sim ij ∈ [0, 1], N j̸ =i sim ij = 1, and α, β are hyper-parameters.sim ij is the similarity score between query q i and code c j .The numerator in Eq.( 4) puts smaller weights on less negative codes, and the denominator is a normalization factor to make sure N j̸ =i w ij = N − 1, which is the same value as in the vanilla InfoNCE loss.Now we can derive that the gradients of negative pairs are proportionally related to w ij : Note that w ij = 1 in this equation when i = j.The vanilla InfoNCE loss is a special case of Eq.( 3), which sets all w ij as 1.Models may learn implicit relationships between a query and different negative samples under the vanilla InfoNCE loss, but we argue that modeling this relationship explicitly by w ij has a positive influence on learning better representations.
Then comes the estimation of similarity score sim ij .The ideal solution is using humanannotated labels.However, there are no existing datasets and it is challenging to label all the possible negative pairs since the number of them increases quadratically with the number of positive pairs (e.g., 100 positive pairs can generate 9900 negative pairs).Thus, we employ the following approaches to estimate the similarity scores sim ij and empirically analyze and compare them in Section 6.
BM25.BM25 is an enhanced version of TF-IDF, which matches certain terms in codes with the given query.
SimCSE.Unsupervised SimCSE (Gao et al., 2021) is a recently proposed method that has outstanding performance on sentence similarity tasks.We measure the similarity between a query and unlabeled code indirectly by measuring that between the query and code's positive query.The assumption behind this is that a query and its positive code are exactly matched so that they are perfectly aligned in the representation space.
Trained Model.Models that are trained with vanilla InfoNCE loss on datasets may have certain capabilities to correctly predict the similarity.
Note that for the last two estimation methods, we load and freeze their pre-trained parameters during training.After calculating the similarity scores, we normalize the results with Softmax function with temperature t to satisfy N j̸ =i sim ij = 1.12762 In this section, we analyze the properties of Soft-InfoNCE loss and compare it with related works to justify its effectiveness.Recall that the optimization objective of vanilla InfoNCE loss can be divided into two parts, alignment and uniformity.The insertion of w ij only has an influence on the second term.Thus, for simplicity, we analyze the second term in this section, which is: (6)

Effect on Representation Distribution
Intuitively, we can control the distribution of negative samples by setting different weights.Here we theoretically prove that Soft-InfoNCE loss upper bounds the KL divergence between the predicted similarity of negative pairs and sim ij .
Theorem 1 For a batch of query representations , and similarity scores S i = {sim j } N j̸ =i , we have: where N is batch size, P θ (c j |q i ) is predicted similarity between query q i and code c j by model θ.
Proof of Theorem 1 is presented in Appendix B.1.We can observe that in addition to the original objective which minimizes the predicted similarity scores of negative pairs, the second term encourages the similarity distribution to fit the given distribution S i .When we set α = β = 1 and sim ij = 1 N −1 , it becomes the vanilla InfoNCE loss hence all w ij = 1.This could also be used as an explanation for the uniformity objective of vanilla InfoNCE loss.

Effect on Mutual Information Estimation
It has already been proved that optimizing InfoNCE loss improves the lower bounds of mutual information for a positive pair (Van den Oord et al., 2018).
Since the optimal value for exp(q • c) is given by p(c|q) p(c) , the derivation can be described as follows: where C neg is the whole set of negative codes for the given query q i .A key step in the above derivation is the approximation from Eq.( 8) to Eq.( 9).In Eq.( 8) we calculate the sum of in a batch to estimate that of the whole negative set.As Van den Oord et al. ( 2018) mentioned, Eq.( 8) becomes more accurate when N increases.By adding a constant term log(N − 1), it would be clearer that InfoNCE loss builds a Monte-Carlo estimation sampling from a uniform distribution: 10) reduces estimation bias since it can be considered as using an importance sampling strategy, resulting in more precise estimation.Specifically, we want to estimate the expectation of p(c j |q i ) p(c j ) according to a real distribution between the query q i and negative codes c j in the context of code search.However, the negative codes in a batch are randomly sampled which follows a uniform distribution.To bridge this gap and get an unbiased estimator, importance sampling is adopted by inserting a weight term w ij .We denote the uniform distribution as p and the real distribution as q.Thus, calculating expectations based on q could be derived from p: If we take both α and β as 1 in Eq.( 4), we have: And we could find that q(c j ) = 1−sim ij N −2 , which is inversely proportional to sim ij .This property is in line with our intuition that we should consider true negatives more when approximating the whole negative set.As we know, importance sampling provides more accurate estimates when the sampling distribution is closer to the real distribution.We hypothesize that for a given query, the majority of codes share a low similarity, which makes it suitable to use the Softmax function to normalize sim ij .We will empirically discuss this hypothesis in Section 6.

Relation with Other Loss Functions
In this part, we connect Soft-InfoNCE loss with other loss functions by analyzing them theoretically.Besides, in Section 6 we report their comparative evaluation results empirically.
Binary Cross Entropy Loss.We may consider sim ij as soft labels and hence use Binary Cross-Entropy (BCE) loss to train the model.The probability in BCE is calculated by exp(q i •c i ) N j=1 exp(q i •c j ) .While Soft-InfoNCE loss upper bounds the KL divergence, BCE lower bounds it, which is described in Theorem 2.
Theorem 2 For a batch of query representations {q i } N i=1 , code representations {c i } N i=1 , and similarity scores S i = {sim j } N j̸ =i , we have: where N is batch size, P θ (c j |q i ) is predicted similarity between query q i and code c j by model θ.
Proof of Theorem 2 is presented in Appendix B.2.In general, minimizing an upper bound leads to better performance compared with lower bounds.
Weighted InfoNCE Loss.Another choice is using weighted InfoNCE loss.We take sim ij as weights, which can be described as: .
(13) Note that we set the weight sim ii for the positive pair as 1.
Proposition 1 Soft-InfoNCE loss upper bounds weighted InfoNCE loss: The equilibrium satisfies when all code fragments in the batch are actually false negatives.
The proof of Proposition 1 is presented in Appendix B.3.Without loss of generalization, our proof is deduced based on one query, however, it also holds for a batch of queries.From the above proposition, we can find that Soft-InfoNCE loss L Sof t upper bounds Weighted InfoNCE loss L W . Thus, one would expect L Sof t to be the superior loss function.
KL Divergence Regularization.As proved in Theorem 1, a KL divergence term that measures the similarity between the given distribution S and predicted distribution P θ is incorporated during the optimization implicitly.We try to explicitly add a KL divergence regularization term to the vanilla InfoNCE loss and compare its performance with our proposed loss empirically.

Relation with False Negative Cancellation
Recently, several works focus on eliminating the effect of false negative samples by first detecting those samples and then removing them from the negative sets.Though these detection methods are different among tasks, their cancellation operations share the same principle.That is, false negatives are removed from the denominator of the InfoNCE loss.It can be considered as a special case of our proposed Soft-InfoNCE loss, which sets the weights w ij of false negatives as 0 while others remain 1. False negative cancellation methods are effective in classification or unsupervised pre-training tasks since they only need to consider whether negative samples belong to the same class or not.However, in the context of a similaritybased retrieval task, models are required to discriminate negative samples by continuous values.In Section 6, we report an empirical comparison between cancellation methods and ours.

Experimental Setup
In this section, we elaborate on the evaluated dataset, baselines, and our implementation details.
Datasets.We use a large-scale benchmark dataset CodeSearchNet (CSN) (Husain et al., 2019) to evaluate the effectiveness of Soft-InfoNCE loss which contains six programming languages including Ruby, Python, Java, Javascript, PHP, and Go.The dataset is widely used in previous studies (Feng et al., 2020;Guo et al., 2021Guo et al., , 2022) ) and the statistics are shown in Appendix C.1.For the training set, it contains positive-only query-code pairs while for the validation and test sets the model attempts to retrieve true code fragments from a fixed codebase.We follow (Guo et al., 2021) to filter out low-quality examples.The performance is measured by the widely adopted the Mean Reciprocal Rank (MRR) which is the average of reciprocal ranks of the true code fragment for a given query.It can be calculated as: where Rank i is the rank of the true code fragment for the i-th given query Q.
Baselines.We apply Soft-InfoNCE loss on several code search models: CodeBERT is a bi-modal pre-trained model pre-trained on mask language modeling and replaced token detection (Feng et al., 2020).Note that in this work we refer CodeBERT to the siamese network architecture described in the original paper.GraphCodeBERT incorporates the structure information of codes and further develops two structure-based pre-training tasks: node alignment and data flow edge prediction (Guo et al., 2021).UniXCoder unifies understanding and generation pre-training tasks to enhance code representation leveraging cross-model contents like Abstract Syntax Trees (Guo et al., 2022).
Implementation Details.For all the settings related to model architectures, we follow the original paper.For hyper-parameter settings that affect the calculation of Soft-InfoNCE loss, we provide implementation details in Appendix C.2.For BM25 estimation, we merely measure similarity based on in-batch data.For trained model estimation, we train the same model of each studied model following the training settings of the original paper on different programming languages separately until convergence.For SimCSE estimation, we first initialize the model with the HuggingFace released parameters2 and then train it following the default setting of the original paper.Note that we collect all the natural language queries from different programming language training sets to boost performance on SimCSE unsupervised learning.The training epoch is set to 30 for all studied models.Experiments described in this paper are running with 3 random seeds 1234, 12345, and 123456.All experiments meet p<0.01 of significance tests except for the results of GraphCodeBERT and UniX-Coder on the Go dataset.Experiments are conducted on a GeForce RTX A6000 GPU.

Results
In this section, we first show the overall performance of three weight estimation approaches when applying Soft-InfoNCE loss.Then, we empirically compare our proposed method with other choices of loss functions and false negative cancellation methods.Finally, we conduct an ablation study to analyze the effect of hyper-parameters.
Overall Results.The results in Table 1 reveal that baseline models equipped with Soft-InfoNCE loss can gain an overall 1%-2% performance improvement in MRR over that of InfoNCE loss across six programming languages.The consistent improvements observed in all three estimation approaches demonstrate the effectiveness of our proposed Soft-InfoNCE loss in code search.Time efficiency comparison is also performed in Appendix D.1.

Comparison among Estimation Approaches.
As shown in Table 1, trained-model estimation significantly outperforms the other two methods on CodeBERT, while all estimation approaches improve on the other two models to a similar extent.Among them, SimCSE is the most robust one concerning model types.In Fig. 2, we take a random batch of samples from CSN-Python to analyze the difference among estimation approaches.From the left figure, we can find that predicted similarity scores roughly follow a softmax distribution, per our hypothesis.As for the right one, we can see that compared with other estimation methods, BM25 generates similar weights for the majority of negative samples.This is because BM25 calculates similarities only based on keyword matching and for most negative codes there is no keyword overlapping at all.While for neural model based methods, they can capture latent semantic similarities.Considering the estimated weights of samples from trained models in the right panel of Fig. 2, we find that some weights that are predicted by the three models are similar while others are not, or even contradictory with each other.
To better understand the differences among these estimation methods, we perform case studies for the eight samples in the right panel of Fig. 2 in Appendix D.2.From the case study, we find that there are contradictory predictions when we use InfoNCE-tuned code search models to predict w ij .This phenomenon indicates that although existing code search models could find true code snippets well, they cannot recognize potential relevance among negative codes, which is the core motivation for proposing Soft-InfoNCE.Kindly note that in this paper we do not investigate estimation methods fully but mainly focus on the effectiveness of Soft-InfoNCE, as discussed in the 'Limitations' Section.
Comparison with Other Loss Functions.Table 2 shows the overall performance of different loss functions.In Table 5, we also give the detailed results for each programming language.For the calculation of these loss functions, we follow the definition described in Sec.4.3.Note that for KL regularization we set the weights of the original loss and regularization term as 1.3 and 0.7 to fairly compare with Soft-InfoNCE loss, and we use SimCSE estimation for all experiments.We can see significant drops of MRR compared with Soft-InfoNCE loss, which is in agreement with our theoretical analysis.We think those three loss functions somewhat improve the mutual information of negative pairs.Take weighted InfoNCE as an example.Though weights for negative pairs are only at around 0.03, optimizing negative pairs still lower bounds the mutual information with a constant log N according to Eqn.9.It makes feature vectors distribute closer and hence hard to distinguish each other, which makes the performance even worse than InfoNCE.The analysis also works for BCE and KL regularization, which try to im- prove the similarity between a query and negative codes as well.Therefore, we argue negative pairs should not be placed at the numerator of InfoNCE.
Comparison with False Negative Cancellation Methods.We apply two types of false negative cancellation methods and evaluate their performance, as shown in Table 3.The first type involves removing top-K similar negatives.We observe that performance decreases when more negative samples are removed because sometimes there are no false negative samples in the batch.Hence, removing top-k negatives directly may accidentally remove hard negative samples.The other option is the dynamic threshold.It takes negative samples that have similarities greater than certain ratios of the positive sample as false negatives.Besides having the same drawbacks as the top-k method, it is also hard to determine an appropriate ratio.Thus, there are slight drops when applying the dynamic threshold.While using Soft-InfoNCE can achieve a MRR of 0.700 which is better than the result in Table 3, we argue that setting a weight on negatives would be less risky than removing them directly.
Effect of α and β. α and β control the weights of two terms in Theorem 1, one for minimizing predicted similarities and the other for KL divergence, which contradicts each other to some extent.To balance training, we performed empirical experiments to guide the setting of these two hyper-parameters, as shown in Fig. 3.Note that we follow α+β 2 = 1 to make Soft-InfoNCE fall on the same scale as the original InfoNCE loss.The left part of Fig. 3 shows that the performance of Soft-InfoNCE is relatively stable with different settings and α = 1.3, β = 0.7 reaches its best performance.Besides, since α and β are incorporated into the calculation of weights, they also have effects on weights estimation.As shown in the right part of Fig. 3, the increasing of α makes the weights more distinct.

Related Works
Code Search Models.There are mainly three stages in the development of code search models.Traditional information retrieval techniques match keywords between queries and code fragments (Hill et al., 2011;Yang and Huang, 2017;Satter and Sakib, 2016;Lv et al., 2015;Van Nguyen et al., 2017).Since natural language and programming language have different syntax rules, they often suffer from vocabulary mismatch problems (McMillan et al., 2011).Then, with the popularity of neural networks, several methods are proposed to better capture the semantics of both queries and codes (Gu et al., 2021;Cambronero et al., 2019;Gu et al., 2018;Husain et al., 2019).Generally, they are encoded by neural encoders into a shared representation space.Recently, transformer-based pre-trained models significantly outperformed previous methods.CodeBERT (Feng et al., 2020) is pre-trained via masked language modeling and replaced token detection.GraphCodeBERT (Guo et al., 2021) leverages data flow as additional information to model the relationship among variables.UniXCoder (Guo et al., 2022) is a model that can support understanding and generation tasks at the same time.This allows it to further boost performance by using the pre-training tasks (e.g.unidirectional language modeling, denoising objective) both.In this work, we mainly consider pre-trained models due to their better performance.
False  Li et al. (2022c) handles it in an iteratively adversarial manner.However, the false negative problem in the fine-tuning of code search is not investigated yet, and it also suffers due to code duplication in code corpora as mentioned by Lopes et al. (2017); Allamanis (2019).The above-mentioned works can be seen as a special case of the proposed Soft-InfoNCE loss.

Conclusion
In this work, we revisit the commonly used In-foNCE loss in code search and analyze its drawback during fine-tuning.By simply inserting weight terms, we propose Soft-InfoNCE to model the potential relation of negative codes explicitly.We further theoretically analyze its effect on representation distribution, mutual information estimation, and superiority over other loss functions and false negative cancellation methods.We evaluate our Soft-InfoNCE loss on several datasets and models.Experiment results demonstrate the effectiveness of our approach, and justify the theoretical analysis.
The equilibrium satisfies when all code fragments in the batch are actually positives and the model perfectly predicts them.

C.2 Hyper-parameter Settings
The settings of α and β are shown in Table 7.The settings of α and β in BM25 are different because we calculate BM25 scores only based on in-batch data, which results in similar scores for negative pairs.Thus, to better distinguish negative pairs, we set higher α compared with the other two estimation methods.And t is used to tune the distribution of negative similarity scores sim ij similar to softmax distribution, which is in accordance with our hypothesis.Note that all the calculated weights w ij are clamped to be greater than 0.1.

D.2 Case Study
We think that it would be beneficial to see a more detailed analysis.Thus, we analyze the eight examples in the right part of Fig. 2  As we could see, the positive query could be summarized as extracting data from a binary file.For SimCSE and BM25, the estimated weights are calculated based on the natural language descriptions of positive codes and negative codes.The second negative example contains tokens like "read from" and "bytes", which makes it perform like reading data from a file as well.Thus, BM25 and SimCSE consider it as the most similar negative sample hence predicting a small weight.However, the natural language description is deceptive.The code of the second negative example in fact reads bytes not based on a file directory but on a given number.On the contrary, since weights that are predicted by trained models are calculated by the positive query and negative codes, real potential false negative samples are captured like the third negative code.We also find that there are cases where BM25 and SimCSE perform better than trained models when the code snippets are too long and complicated which makes it hard for trained models to capture the main purpose of the code.

D.3 Comparison with Other Loss Functions
Table 5 shows detailed results on each programming language.

Figure 1 :
Figure 1: Contrastive learning pushes away negative pairs in the representation space.Left: Existing works treat negative pairs equally.Right: Negative pairs should be pushed away according to their similarity with the query.A thicker arrow means that this sample is more negative than others.

Figure 2 :
Figure 2: Similarity estimations by different approaches for a random batch of samples in CSN-Python.

Figure 3 :
Figure 3: MRR and estimated weights under different α and β settings on GraphCodeBERT over CSN-Python.

Table 1 :
Results of different weight estimation approaches under MRR.

Table 3 :
Performance of false negative cancellation methods applied to GraphCodeBERT on CSN-Python.

Table 7 :
α, β and t of different estimation methods.

Table 8 :
Training time efficiency comparison of In-foNCE and Soft-InfoNCE.To compare the time efficiency, we take CodeBERT as an example, and calculate time cost per batch based on the average value of 30 epochs, using the same batch size for training.The results shown in Table8reveal that the implementation of Soft-InfoNCE has a negligible increase in time overhead while improving performance.As discussed in Limitations, we aim to show and justify the effectiveness of Soft-InfoNCE over InfoNCE in this work and we leave the efficiency improvement of Soft-InfoNCE as our future work.
Listing 6: Enrich the SKOS relations according to SKOS semantics, including subproperties of broader and symmetric related properties.Listing 8: power down the OpenThreadWpan.
Table 2 calculates the average value of different programming languages to demonstrate the overall performance.