APP: Adaptive Prototypical Pseudo-Labeling for Few-shot OOD Detection

Detecting out-of-domain (OOD) intents from user queries is essential for a task-oriented dialogue system. Previous OOD detection studies generally work on the assumption that plenty of labeled IND intents exist. In this paper, we focus on a more practical few-shot OOD setting where there are only a few labeled IND data and massive unlabeled mixed data that may belong to IND or OOD. The new scenario carries two key challenges: learning discriminative representations using limited IND data and leveraging unlabeled mixed data. Therefore, we propose an adaptive prototypical pseudo-labeling (APP) method for few-shot OOD detection, including a prototypical OOD detection framework (ProtoOOD) to facilitate low-resource OOD detection using limited IND data, and an adaptive pseudo-labeling method to produce high-quality pseudo OOD\&IND labels. Extensive experiments and analysis demonstrate the effectiveness of our method for few-shot OOD detection.


Introduction
Out-of-domain (OOD) intent detection learns whether a user query falls outside the range of pre-defined supported intents.It helps to reject abnormal queries and provide potential directions of future development in a task-oriented dialogue system (Akasaki and Kaji, 2017;Tulshan and Dhage, 2018;Lin and Xu, 2019;Xu et al., 2020;Zeng et al., 2021a,b;Wu et al., 2022a,b;Mou et al., 2022).Since OOD data is hard to label, we need to rely on labeled in-domain samples to facilitate detecting OOD intents.
Previous OOD detection studies generally work on the assumption that plenty of labeled IND intents exist.They require labeled in-domain data to learn intent representations and then use scoring functions to estimate the confidence score of a test query belonging to OOD.For example, Hendrycks and Gimpel (2016) proposes Maximum Softmax Probability (MSP) to use maximum softmax probability as the confidence score and regard a query as OOD if the score is below a fixed threshold.Xu et al. (2020) further introduces another distance-based method, Gaussian discriminant analysis (GDA), which uses the maximum Mahalanobis distance (Mahalanobis, 1936) to all in-domain classes centroids as the confidence score.Although these models achieve satisfying performance, they all rely on sufficient labeled IND data, which limits their ability to practical scenarios.
In this paper, we focus on a more practical setting of OOD detection: there are only a few labeled IND data and massive unlabeled mixed data that may belong to IND or OOD.We call it as Few-Shot OOD Detection.Note that the mixed data don't have labeled IND or OOD annotations.Considering that unlabeled data is easily accessible, we believe this setting is more valuable to be explored.However, few-shot OOD detection carries two key challenges.(1) Learning discriminative representations using limited IND data: OOD detection requires discriminative intent representations to separate IND&IND intents and IND&OOD intents.However, traditional models based on crossentropy or supervised contrastive learning (Zeng et al., 2021a) fail to distinguish intent types under the few-shot setting.Limited labeled IND data makes it hard to learn discriminative intent representations.(2) Leveraging unlabeled mixed data: Unlabeled data contains IND and OOD intents which both benefit in-domain intent recognition and OOD detection.But it's nontrivial to leverage the mixed data because we don't know prior knowledge of OOD data.
To solve the issues, we propose an Adaptive Prototypical Pseudo-labeling (APP) for Few-shot OOD Detection.To learn discriminative representations, we propose a prototypical OOD detection framework (ProtoOOD) using limited IND data.Inspired by the idea of PCL (Li et al., 2020), we introduce an instance-instance loss to pull together samples of the same class and an instanceprototype loss to enforce the prototypes to be center points of classes.After training a prototypical in-domain classifier using few-shot IND data, we compute the maximum cosine distance of an input query to all in-domain prototypes as the confidence score.If the score is above a fixed threshold, we believe it's an OOD intent.Compared to existing OOD detection methods (Hendrycks and Gimpel, 2016;Lin and Xu, 2019;Xu et al., 2020;Wu et al., 2022c;Mou et al., 2022), our prototypical OOD detection framework models rich class-level semantics expressed implicitly by training instances.Empirical experiments in Section 3.3 demonstrate our framework achieves superior performance both on IND and OOD metrics.To leverage unlabeled mixed data, we propose an adaptive pseudo-labeling method to iteratively label mixed data and update the prototypical IND classifier.We find typical pseudo-labeling methods (Lee, 2013;Cascante-Bonilla et al., 2020;Rizve et al., 2021) work poorly because the model can't produce highquality pseudo IND&OOD labels, and get even worse performance.Therefore, we introduce two instance-prototype margin objectives to adaptively pull together pseudo IND samples and prototypes and push apart pseudo OOD samples.We aim to distinguish the confidence score distributions of IND and OOD data by adjusting distances between IND or OOD samples and prototypes (see Section 4.2).
Our contributions are: (1) We propose an adaptive prototypical pseudo-labeling (APP) method for few-shot OOD detection.(2) We introduce a prototypical OOD detection framework (ProtoOOD) to learn discriminative representations and facilitate low-resource OOD detection using limited IND data, and an adaptive pseudo-labeling method to produce high-quality pseudo OOD&IND labels to leverage unlabeled mixed data.(3) Experiments and analysis demonstrate the effectiveness of our method for few-shot OOD detection.

Problem Formulation
In the few-shot OOD detection setting, we assume that a limited labeled in-domain (IND) dataset D l = {(x i , y i )} n i=1 consists n samples drawn from IND, and an unlabeled dataset D u = {(x i )} m i=1 consists m unlabeled samples drawn both ID and OOD.Note that we don't know whether an unlabeled sample in D u belongs to IND or OOD.Our goal is to distinguish whether an unknown test sample is drawn from ID or not using D l and D u .Compared to traditional OOD detection setting, few-shot OOD detection carries two key challenges: learning discriminative representations using limited IND data and leveraging unlabeled mixed data.

Adaptive Prototypical Pseudo-Labeling
Overall Architecture Fig 1 shows the overall architecture of our proposed adaptive prototypical pseudo-labeling (APP) for few-shot OOD detection.APP includes two training stages.We first use the limited labeled IND data D l to train a prototypical in-domain classifier to learn discriminative intent representations.Then, we use an adaptive pseudolabeling method to iteratively label mixed data D u and update the prototypical classifier.In the inference stage, we compute the maximum cosine similarity of an input test query to all in-domain prototypes as the confidence score.If the score is below a fixed threshold, we believe it's OOD.
Prototypical OOD Detection Previous OOD detection models (Hendrycks and Gimpel, 2016;Xu et al., 2020;Zeng et al., 2021a;Wu et al., 2022c;Mou et al., 2022) are customized for the setting with sufficient labeled IND data and lack generalization capability to few-shot OOD detection.We find these models suffer from the overconfidence issue (Liang et al., 2017a,b) where an OOD test sample even gets an abnormally high confidence score and is wrongly classified into in-domain types (see Section 4.1).Therefore, inspired by recent prototype learning work (Li et al., 2020;Cui et al., 2022), we propose a prototypical OOD detection framework (ProtoOOD) to learn discriminative intent representations in the few-shot setting.
As shown in Fig 1, we first get the hidden state of [CLS] to represent an input IND sample, then project it to another embedding space for prototype learning.The prototypes are used as class centroids.Denote C = c 1 , • • • , c |C| as the set of prototype vectors, which are randomly initialized.We introduce the following objectives.The first one is the instance-instance loss: where s i , s j are projected features from [CLS] hidden states.N y i is the total number of examples in the batch that have the same label as y i and 1 is an indicator function.This loss aims to pull together samples of the same class and push apart samples from different classes, which helps learn discriminative representations.The second is the instance-prototype loss: where c i is the corresponding prototype of the sample s i and |C| the the total number of prototypes.This objective forces each prototype to lie at the center point of its instances.Our final training objective L pcl = L ins + L proto combines the instance-instance loss and instance-prototype loss.
Adaptive Pseudo-Labeling Few-shot OOD detection has a large corpus of unlabeled data that contains mixed IND and OOD intents.How to exploit the data is vital to both benefiting in-domain intent recognition and OOD detection.However, it's nontrivial to apply existing semi-supervised methods (Lee, 2013;Cascante-Bonilla et al., 2020;Sohn et al., 2020) to leveraging the unlabeled data.Because prior knowledge of OOD data is unknown, making it hard to distinguish unlabeled IND and OOD intents simultaneously (see Section 4.2).Therefore, we propose an adaptive pseudolabeling method to iteratively label mixed data and update the prototypical classifier.
Specifically, we design two thresholds S and L (S < L): if the maximum cosine similarity of an input query to all in-domain prototypes is higer than the larger threshold L, we believe it belongs to IND.And if the similarity is smaller than the smaller threshold S, we believe it belongs to OOD2 .Then the pseudo IND samples can be directly used by optimizing L pcl .To use the pseudo OOD samples x OOD i , we propose an OOD instance-prototype margin objective to push away pseudo OOD samples from in-domain prototypes: where c l is the l-th prototype and M OOD is a margin hyperparameter.However, we find simply applying L ood can't produce correct pseudo IND samples.Because noisy pseudo OOD labels in L ood may also have IND samples and will make the distances of all the samples including IND and OOD to prototypes larger.Therefore, we further propose an IND instance-prototype margin objec-

Datasets
We evaluate our method on two commonly used OOD intent detection datasets, Banking (Casanueva et al., 2020) and Stackoverflow (Xu et al., 2015).Following previous work (Mou et al., 2022), we randomly sample 25%, 50%, and 75% intents as the IND intents, and regard all remaining intents as OOD intents.To verify the effectiveness of our method on few-shot OOD detection with mixed unlabeled data, we divide the original training sets of the two datasets into two parts.We randomly sample k = 5, 10, 20 instances in each IND intent from the original training set to construct few-shot labeled IND dataset, while the remaining training samples are treated as the unlabeled mixed data including both IND and OOD class.We only 3 We leave out these hyperparameters to Implementation Details in Section A.3.

Baselines
To verify the effectiveness of our prototypical OOD Detection (ProtoOOD) method under the few-shot OOD detection setting.We compare ProtoOOD with OOD detetcion baselines using recent baselines.For the feature extractor, we use the same BERT (Devlin et al., 2019) as backbone.We compare our method with the training objective crossentropy(CE) and the scoring functions including MSP (Hendrycks and Gimpel, 2017), LOF (Lin and Xu, 2019), GDA (Xu et al., 2020) and Energy (Wu et al., 2022c).Besides, we also compare our method with the state-of-the-art baselines UniNL (Mou et al., 2022).We supplement the details of relevant baselines in the appendix A.2.
For the self-training on unlabeled mixed data, we compare our adaptive prototypical pseudo-labeling (APP) method with three different training objectives as the baselines of our unlabeled data training.
(1) L pcl : For pseudo IND samples, we only use instance-instance loss and instance-prototype loss for training.(2) L pcl + L ind : For pseudo IND samples, in addition to instance-instance loss and instance-prototype loss training, the IND margin objective is added.(3) L pcl + L ood : For pseudo IND samples, we use instance-instance loss and instance-prototype loss, and OOD margin objective training for pseudo OOD samples.

Main Results
Table 1 and 2 show the performance comparison of different methods on Banking and Stackoverflow dataset respectively.In general, our proposed prototypical OOD detection framework (ProtoOOD) and adaptive prototypical pseudo-labeling method (APP) consistently outperform all the baselines with a large margin.Next, we analyze the results from two aspects: (1) Advantage of prototypical OOD detection.We compare our proposed ProtoOOD with previous OOD detection baselines.Experimental results show that ProtoOOD is superior to other methods in both IND and OOD metrics, and the less labeled  IND data, the more obvious the advantages of it.For example, on Banking dataset, ProtoOOD outperforms previous state-of-the-art baseline Energy by 2.04%(OOD F1), 2.18%(ALL ACC) on 20-shot setting, 5.91%(OOD F1), 8.61%(ALL ACC) on 10-shot setting and 8.89%(OOD F1), 11.68%(ALL ACC) on 5-shot setting.We also observed that the performance of previous methods under the 5-shot setting decreased significantly.We think that this is because previous methods rely on a large number of labeled IND samples to learn the generalized intent representations.LOF and KNN-based detection methods are easily affected by outliers.GDA will be affected by the inaccurate estimation of the covariance of the IND cluster distribution under the few-shot setting.MSP and Energy will encounter overconfidence problems due to the over-fitting of the neural network under the few-shot setting, resulting in the wrong detection of OOD as IND.In contrast, ProtoOOD is insensitive to outlier samples and has good generalization ability.
(2) Comparison of different self-training methods.We introduce two instance-prototype margin objectives for prototype-based self-training on unlabeled data.To understand the importance of the two margin objectives on our adaptive prototypical pseudo-labeling method, we perform an ablation study.The experimental results show that the joint optimization of L pcl , L ind and L ood achieves the best performance.We think that since there are not only IND samples but also a large number of OOD samples in the unlabeled data, it is challenging to directly use the model pre-trained on few-shot labeled IND samples for pseudo-labeling, which will limit the performance of prototypebased self-training.After introducing two instanceprototype margin objectives, which adaptively adjust the distance between the IND/OOD samples and prototypes, we can obtain more reliable pseudo labels stably.We also find that when the two margin objectives are used alone, they will hinder the prototype-based self-training.This shows that when we use instance-prototype margin objectives for prototype-based self-training, we need to constrain the distance from IND and OOD to the prototypes at the same time.Besides, in order to explore the adaptability of adaptive prototypical pseudo-labeling on other OOD detection methods, we discuss it in Appendix C. The conclusion is that our APP method has strong generality and is compatible with other OOD detection methods well.
4 Qualitative Analysis We can see that the prototypical score can better distinguish IND and OOD under the few-shot setting, while other scores suffer serious IND and OOD overlapping.This also shows that compared with the previous OOD detection methods, our Pro-toOOD method can learn more generalized representations under the few-shot setting, which is beneficial to distinguish IND and OOD.We can clearly see that as the training process goes on, IND and OOD intents are gradually separated, and there is no case that the OOD samples are too close to the prototypes or the IND samples are too far away from the prototypes.We also show the change of score distribution curves of other self-training variants in Appendix A.2.It can be seen that when we remove L ood , a large number of OOD samples will be identified as IND; When we remove L ind , a large number of IND samples will be classified as OOD.We think that this is because the two instance-prototype margin objectives constrain the distance between IND/OOD and class prototypes respectively.It is necessary to constrain the distances between IND and OOD to the class prototypes at the same time.This also shows that the two instance-prototype margin objectives is beneficial to obtain reliable pseudo labels stably.
Changes of class prototypes In addition, we also observe the changes of class prototypes during adaptive pseudo-labeling in Fig 5. We can see that the class prototypes are gradually close to the center of IND clusters, and away from OOD data.This intuitively reflects that our APP method can adaptively adjust the distance between IND or OOD samples and class prototypes by adding two instance-prototype margin objectives, which facilitates pseudo-labeling.
Changes of the number of correct pseudo labels We also count the number of correct pseudo labels in the self-training process.Fig 6 shows the results.The horizontal axis is the epoch of selftraining, and the vertical axis is the correct number of pseudo labels.It can be seen that when only using L pcl , more and more IND data can be correctly detected, but OOD data cannot be effectively detected.We believe that this is because we only pre-train the model on few-shot labeled IND data, and did not encounter OOD samples during the model pre-training, so simply using L pcl for selftraining cannot get reliable OOD pseudo labels.When we use L pcl + L ood , we get the opposite result, we can only detect more and more OOD samples.We think that it is because the OOD margin objective only pushes the prototype far away from OOD data, but it cannot pull it close to more IND data, resulting in the model can not correctly detect more IND data.In contrast, when we use three objectives for joint optimization, we get more  and more correct pseudo IND samples and pseudo OOD samples.This is why APP achieves the best effect compared with the other two self-training methods.It can promote prototype optimization towards the direction of being close to IND and far from OOD.Since using L pcl + L ind gets the similar results as using only L pcl , so we only show the results of L pcl for brevity.

Hyper-parameter Analysis
The effect of the threshold for pseudo-labeling.
We use T to represent the position of the threshold selected when pseudo-labeling.For example, T =5 means that we choose the fifth highest score as the upper threshold L and the fifth lowest score as the lower threshold S. We compare the effect of different T on the OOD detection performance as shown in Table 3.It can be seen that when T takes a smaller value, it can often bring better OOD detection performance.This is because the stricter threshold brings higher accuracy of pseudolabeling, thus improving the detection performance.Moreover, under different T, our method APP all exceeds the performance of other methods, which proves APP is robust for different T.
The effect of the Margin Value for IND and OOD margin loss.We select six different sets of IND margin and OOD margin values to illustrate the impact of Margin Values on detection performance as shown in Table 5.It can be seen that margin value has little effect on the OOD detection performance, which illustrates the robustness of our proposed margin loss.
The effect of coefficients for IND and OOD margin loss.We select three different coefficient combination of loss to illustrate the impact of co-  efficients.As shown in Table 5, it can be seen that coefficient has little effect on the OOD detection performance, which illustrates the robustness of our method.

Effect of Different Ratios of OOD Data
We compared the effect of different ratios of OOD Data.The results are shown in Table 9.We can see our proposed prototypical OOD Detection outperforms Energy on all IND ratios.It shows the effectiveness of protoOOD in few-shot OOD detection.After the self-training of unlabeled data, the effect of both IND classification and OOD detection are improved.As the ratio of OOD decreases, we find that F1-OOD is decreasing and F1-IND is increasing.some OOD samples are more likely to be confused with one of the IND intents.The number of IND classes increases the prior knowledge available for IND learning, enabling the model to learn better representations of IND and distinguish it from OOD.

Few-shot OOD detection of ChatGPT
Language Learning Models (LLMs) such as Chat-GPT4 have demonstrated strong capabilities across a variety of tasks.In order to compare the perfor- Results are shown in Figure 6.It shows that ChatGPT's effectiveness in identifying OOD samples is notably low, as shown by the significantly low Recall-OOD of 29.57 for 5-shot.Through case studies, we discover that it often misclassifies OOD samples as IND.This might be due to a clash between its broad general knowledge and specific domain knowledge.It further suggests that it struggles to filter out unusual samples, which is usually the first step in practical uses of the OOD task and constitutes the main goal of this task.On a positive note, its broad general knowledge has given it strong abilities to classify IND samples.We hope to explore strategies to merge the strengths of both models to improve overall performance in our future work.

Related Work
OOD Detection Previous OOD detection works can be generally classified into two types: supervised (Fei and Liu, 2016;Kim and Kim, 2018;Larson et al., 2019;Zheng et al., 2020) and unsupervised (Bendale and Boult, 2016;Hendrycks and Gimpel, 2017;Shu et al., 2017;Lee et al., 2018;Ren et al., 2019;Lin and Xu, 2019;Xu et al., 2020) OOD detection.The former indicates that there are extensive labeled OOD samples in the training data.Fei and Liu (2016); Larson et al. (2019), form a (N+1)-class classification problem where the (N+1)-th class represents the OOD intents.We focus on the unsupervised OOD detection setting where labeled OOD samples are not available for training.Unsupervised OOD detection first learns discriminative representations only using labeled IND data and then employs scoring functions, such as Maximum Softmax Probability (MSP) (Hendrycks and Gimpel, 2017), Local Outlier Factor (LOF) (Lin and Xu, 2019), Gaussian Discriminant Analysis (GDA) (Xu et al., 2020), Energy (Wu et al., 2022c) to estimate the confidence score of a test query.Inspired by recent prototype models (Li et al., 2020;Cui et al., 2022) for few-shot learning, we propose a prototypical OOD detection framework (ProtoOOD) to facilitate lowresource OOD detection using limited IND data.Besides, (Zhan et al., 2022) is about few-shot OOD detection utilizing generation model to create more IND and OOD.
Self-Supervised Learning is an active research area, including pseudo-labeling (Lee, 2013;Cascante-Bonilla et al., 2020), consistency regularization (Verma et al., 2019;Sohn et al., 2020) and calibration (Yu et al., 2019;Xia et al., 2018).Since we focus on the few-shot OOD detection, we use the simple pseudo-labeling method and leave other methods to future work.Different from existing self-supervised learning work based on the assumption that all the data is drawn from the same distribution, few-shot OOD detection faces the challenge of lack of prior OOD knowledge, making it hard to distinguish unlabeled IND and OOD intents simultaneously.Therefore, we propose an adaptive pseudo-labeling method to iteratively label mixed data and update the prototypical classifier.

Conclusion
In this paper, we establish a practical OOD detection scenario: there are only a few labeled IND data and massive unlabeled mixed data that may belong to IND or OOD.We find existing OOD work can't effectively recognize OOD queries using limited IND data.Therefore, we propose an adaptive prototypical pseudo-labeling (APP) method for few-shot OOD detection.Two key components are the prototypical OOD detection framework (ProtoOOD) and adaptive pseudo-labeling strategy.We perform comprehensive experiments and analysis to show the effectiveness of APP.We hope to provide new insight of OOD detection and explore more selfsupervised learning methods for future work.

Limitations
In this paper, we propose an adaptive prototypical pseudo-labeling (APP) method for few-shot OOD detection, including a prototypical OOD detection framework (ProtoOOD) and adaptive pseudolabeling method.Although our model achieves excellent performance, some directions are still to be improved.(1) We consider a basic self-supervised learning (SSL) method, pseudo-labeling.More other SSL methods should be considered.(2) Although our model achieves superior performance than the baselines, there is still a large gap to be improved compared to the full-data OOD detection.
(3) Apart from SSL, unsupervised representation learning methods (Li and Xiangling, 2022;Zeng et al., 2021c) also make an effect, which are orthogonal to our pseudo-labeling method.confidence score of IND data in the validation set, sort it from large to small, and select the score at 75% as the threshold.We conducted a total of 50 epochs pseudo-labeling and pseudo-data training.To avoid randomness, we average results over 3 random runs.The training of prototypical OOD detection lasts about 1.5 minutes and 20 minutes for pseudo-labeling of the second stage both on a single Tesla T4 GPU(16 GB of memory).The average value of the trainable model parameters is 110.18M.

B Distribution Changes in Adaptive
Pseudo-Labeling We also show the change of score distribution curves of other self-training methods in Fig 7 and 8.It can be seen that when we remove L ood , a large number of OOD samples will be identified as IND; When we remove L ind , a large number of IND samples will be classified as OOD.

C Generality of adaptive prototypical pseudo-labeling
Our adaptive prototypical pseudo-labeling (APP) method can also be combined with other OOD detection methods.In Table 8, we combine APP with SCL+GDA and compare it with our ProtoOOD.The experimental results show that the OOD detection method ProtoOOD proposed by us is signifi-cantly better than SCL+GDA under the few-shot OOD detection setting.In addition, we use our proposed APP method for self-training on unlabeled data, and find that SCL+GDA also achieve performance improvement, which shows that our APP method has strong generality, and can be compatible with other OOD detection methods well.

Figure 1 :
Figure 1: The overall architecture of our adaptive prototypical pseudo-labeling (APP) for few-shot OOD detection.
) where M IN D is a margin hyperparameter 3 and cos is the cosine similarity.We show the number of correct predicted pseudo-labeled IND and OOD samples in Fig 6 and find L ood and L ind systematically work as a whole and reciprocate each other.We aim to distinguish the confidence score distributions of IND and OOD data by adjusting distances between IND or OOD samples and prototypes.The final training loss in the second stage is L = L pcl + 0.05L ind + 0.05L ood .We show he overall training process in Figure1and summarize the pseudo-code of our method in Algorithm 1.
Algorithm 1 : Adaptive Prototypical Pseudo-Labeling Require: training dataset D L = {(x i , y i )} n i=1 and D U = x OOD i m i=1 , training steps S, epoch E Ensure: a OOD intent detection model, which can classify an input query to either one IND class or OOD class.Y = {1, . . ., N } ∪ {OOD}.1: randomly initialize prototype embedding µ j , j = 1, 2, ..., N .2: for step = 1 to S do 3: sample a mini-batch B from D L 4: get the embedding z i of sample x i through the projection layer 5: compute L ins and L proto ▷ prototypical contrastive representation learning 6: update the network parameters and prototype vectors 7: end for 8: for epoch = 1 to E do 9: align sample x i from D U with prototypes µ j 10: compute L ins , L proto and L IN D for pseudo-IND samples 11: compute L OOD for pseudo-OOD samples 12: add L ins , L proto , L IN D and L OOD together, and jointly optimize them ▷ prototypical contrastive representation learning and adaptive margin learning 13: update the network parameters and prototype vectors.14: end for use the few-shot labeled IND dataset to train the prototypical in-domain classifier.And we adopt the adaptive pseudo-labeling method to label unlabeled mixed IND and OOD data.Besides, since we focus on few-shot OOD detection setting, we also reduce the number of IND samples in the development set.Test set is the same as orginal data.More detailed information of dataset is shown in Appendix A.1.

Figure 2 :
Figure 2: Score distribution curves of IND and OOD data using different scoring functions.

4. 1
Effect of Prototypical OOD DetectionIn few-shot OOD detection, how to use limited IND data to learn discriminative intent representations is a key challenge.In order to compare the performance of different representation learning objectives under few-shot settings, we perform intent visualization of CE, KNCL(Mou et al., 2022) and our prototypical contrastive learning objective, as shown in Fig 3.The results show that the prototypebased method can learn more compact IND clusters, and the distance between OOD and IND is farther, which facilitates OOD detection.To analyze the performance of different scoring functions in the few-shot setting, we compare the GDA, MSP, Energy and our proposed Prototypebased score distribution curves for IND and OOD data, as shown in Fig 2. The smaller the overlap-ping area of IND and OOD intents means that it is more beneficial to distinguish IND and OOD.

4. 2
Effect of Adaptive Pseudo-Labeling Change of IND and OOD score distribution In order to further explore the advantages of adaptive pseudo-Labeling (APP), we show the change of IND and OOD score distribution curves during the self-training process in Fig 4.

Figure 3 :Figure 4 :
Figure 3: Visualization of IND and OOD intents using different IND pre-training losses.

Figure 5 :
Figure 5: Visualization of IND and OOD intents during adaptive pseudo-labeling.

Figure 7 :
Figure 7: Change of score distribution curve of IND and OOD data during pseudo-labeling with L pcl .Since the change trend is the same when using L pcl and L pcl + L ind for self-training, we only show one of them here.

Figure 8 :
Figure 8: Change of score distribution curve of IND and OOD data during pseudo-labeling with L pcl + L ood .

Table 3 :
Effect of threshold on the training performance of adaptive pseudo-labeling on stackhoverflow 25% IND ratio under 10-shot.Here we compare the results with L pcl + L ood to prove the robustness of APP.

Table 5 :
Effect of Margin Values on the training performance of adaptive pseudo-labeling on StackOverflow 25% IND ratio under 10-shot.Here we compare the results with L pcl + L ood to prove the robustness of APP.
The prompt we use is: <Task description> You are an out-of-domain intent detector, and your task is to detect whether the intents of users' queries belong to the intents supported by the system.If they do, return the corresponding intent label, otherwise return unknown.The supported intents include:[Intent 1] ([Example1] [Example 2]...), ... The text in parentheses is the example of the corresponding intent.<Response format> Please respond to me with the format of "Intent: XX" or "Intent: unknown".<Utterance for test> Please tell me the intent of this text: [Here is the utterance for text.]