TEXTOIR: An Integrated and Visualized Platform for Text Open Intent Recognition

TEXTOIR is the first integrated and visualized platform for text open intent recognition. It is composed of two main modules: open intent detection and open intent discovery. Each module integrates most of the state-of-the-art algorithms and benchmark intent datasets. It also contains an overall framework connecting the two modules in a pipeline scheme. In addition, this platform has visualized tools for data and model management, training, evaluation and analysis of the performance from different aspects. TEXTOIR provides useful toolkits and convenient visualized interfaces for each sub-module, and designs a framework to implement a complete process to both identify known intents and discover open intents.


Introduction
Analyzing user intents plays a critical role in human-machine interaction services (e.g., dialogue systems). However, many current dialogue systems are confined to recognizing user intents in closedworld scenarios, and they are limited to handle the uncertain open intents. As shown in figure 1, it is easy to identify specific purposes, such as Flight Booking and Restaurant Reservation. Nevertheless, as the user intents are varied and uncertain, predefined categories may be insufficient to cover all user needs. That is, there may exist some unrelated user utterances with open intents. It is valuable to distinguish these open intents from known intents, which is helpful to improve service qualities, and further discover fine-grained classes for mining potential user needs.
We divide open intent recognition (OIR) into two modules: open intent detection and open intent discovery. The first module aims to identify n-class known intents and detect one-class open intent (Yan et al., 2020;Lin and Xu, 2019;Shu et al., 2017). It can identify known classes but fail to discover specific open classes. The second module further groups the one-class open intent into multiple fine-grained intent-wise clusters (Vedula et al., 2020;Lin et al., 2020;Perkins and Yang, 2019). Nevertheless, the adopted clustering techniques are not able to identify known categories.
The two modules have achieved huge progress with various advanced methods on benchmark datasets. However, there still exist some issues, which bring difficulties for future research. Firstly, there are no unified and extensible interfaces to integrate various algorithms for two modules, bringing challenges for further model development. Secondly, the current methods of the two modules lack convenient visualized tools for model management, training, evaluation and result analysis. Thirdly, the two modules both have some limitations for OIR. That is, neither of them can identify known intents and discover open intents simultaneously. Therefore, OIR remains at the theoretical level, and it needs an overall framework to connect the two modules for finishing the whole process.
To address these issues, we propose TEXTOIR, the first integrated and visualized text open intent recognition platform. The platform has the follow- ing features: (1) It provides toolkits for open intent detection and open intent discovery, respectively. The toolkits contain flexible interfaces for data, configuration, backbone and method integration. Specifically, it integrates a series of advanced models for two modules. Each module supports a complete workflow, including data and backbone preparation with different assigned parameters, training, and evaluation. It provides standard and convenient modules to add new methods. More detailed information can be found on https://github.com/ thuiar/TEXTOIR.
(2) It designs an overall framework combining two sub-modules naturally, achieving a complete OIR process. The overall framework integrates the advantages of two modules, which can automatically identify known intents and discover open intent clusters with recommended keywords.
(3) It provides a visualized surface for utilization. Users can leverage the provided methods or add their datasets and models for open intent recognition. We provide the front end interface for the two modules and the pipeline module. Each of the two modules supports model training, evaluation and detailed result analysis of different methods. The pipeline module leverages both the two modules and shows the complete text OIR results. More detailed information can be found on https://github.com/thuiar/TEXTOIR-DEMO.
As shown in Figure 3, we provide unified dataprocessing interfaces. It supports preparing data in the format of two modules. For example, it samples known intents and labeled data with the assigned parameters for training and evaluation. Besides these labeled data, the remaining unlabeled data are also leveraged for open intent discovery. Users can see detailed statistics information from the frontend webpage and manage their datasets.

Models
Our platform integrates a series of advanced and competitive models for two modules, and provides toolkits with standard and flexible interfaces.

Open Intent Detection
This module leverages partial labeled known intent data for training. It aims to identify known intents and detect samples that do not belong to known intents. These detected samples are grouped into a single open intent class during testing. We divide the integrated methods into two categories: threshold-based and geometrical featurebased methods.
The threshold-based methods consist of MSP (Hendrycks and Gimpel, 2017), DOC (Shu et al., 2017), and OpenMax (Bendale and Boult, 2016). These methods are first pre-trained under the supervision of the known intent classification task. Then, they leverage the probability threshold for detecting the low-confidence open intent samples. The geometrical feature-based methods include DeepUnk (Lin and Xu, 2019) and ADB (Zhang et al., 2021a). DeepUnk adopts the metric-learning method to learn discriminative intent features, and the density-based methods to detect the open intent samples as anomalies. ADB further uses the boundary loss to learn adaptive decision boundaries.

Open Intent Discovery
This module uses both known and open intent samples as inputs, and aims to obtain intent-wise clusters by learning from similarity properties with clustering technologies. As suggested in (Zhang et al., 2021b;Lin et al., 2020), the integrated methods are divided into two parts, including unsupervised and semi-supervised methods.
The unsupervised methods include K-Means (KM) (MacQueen et al., 1967), agglomerative clustering (AG) (Gowda and Krishna, 1978), SAE-KM, DEC (Xie et al., 2016), and DCN (Yang et al., 2017). The first two methods adopt the Glove (Pennington et al., 2014) embedding, and the last three methods leverage stacked auto-encoder to extract representations. These methods do not need any labeled data as prior knowledge and learn structured semantic-similar knowledge from unlabeled data.

Interfaces
We provide a series of interfaces for the two modules. Firstly, the backbones are flexible and unified. For example, the primary backbone is the pre-trained BERT (Devlin et al., 2019) model, and it supports adding new bert-based models with different downstream tasks. The open intent discovery module also supports other backbones for unsupervised clustering. Secondly, each module has the common data-loaders following the needed formats of the adopted backbones. They encode unified data vectors from the prepared data as mentioned in section 2.1. Thirdly, the parameter configurations are convenient. We extract common parameters (e.g., known intent ratio, dataset, etc.) for each module and support adding different sets of hyper-parameters for tuning each method. Finally, each approach integrates standard components of training, evaluation, and other specific functions.

Pipeline Framework
The two modules of open intent detection and discovery are closely related. However, there lacks an overall framework to successively invoke the two modules for both identifying known intents and discovering open intents. TEXTOIR addresses this issue with a proposed pipeline framework, as shown in Figure 3 and

Training and Evaluation
Our platform provides visualized surfaces for model training and evaluation. For each method, users can change the main hyper-parameters to tune the model. When training starts, it automatically creates a record for the training process, which state  can be monitored by the users. When the training process finishes successfully, the trained model and related parameters are saved for further utilization. For model evaluation, the predicted results are observed from different views. Firstly, the overall performance is shown with the number of correct and false samples for each intent class. On this basis, the number of fine-grained false-predicted classes is further shown to analyze the easilyconfused intents regarding the ground truth. Secondly, the influences of the known intent ratio and labeled ratio are correspondingly shown with line charts. Users can observe the results on different selected datasets and evaluation metrics.

Open Intent Detection
This module shows the results of identified known intent samples and detected open intent samples. For threshold-based methods, it visualizes the distribution of known and open intents with different confidence scores, which may be helpful for selecting suitable probability threshold, as shown in Figure 5. For geometrical-based methods, it visualizes the intent representations on the two-dimension plane. Specifically, t-SNE (Maaten and Hinton, 2008) is applied to the high-dimension features to achieve the dimensionality reduction. Moreover, we show some auxiliary information of each point (e.g., the centre and radius of ADB), as shown in Figure 6.

Open Intent Discovery
For unsupervised and semi-supervised clustering methods, it shows the geometric positions of each produced cluster center with corresponding labels. These centers are categorized into the known and open classes, as shown in Figure 7. Users can mine the similarity relations of both known and open intents from observation of center distribution.
As the labels of clusters are not applicable in real scenarios, we adopt the KeyBERT 3 toolkit to extract keywords for open intents in the sentencelevel and cluster-level. Furthermore, it calculates the confidence scores of the keywords in the cosine similarity space. The top-3 keywords are recommended for each discovered open intent with respective confidence scores, as shown in Figure 4.

Experiments
We use four intent benchmark datasets mentioned in section 2.1 to verify the performance of our TEX-TOIR platform. The known intent ratios are varied between 25%, 50% and 75%. The labeled proportions are varied between 50% and 100%. To evalu-  Table 1: The open intent recognition results of ADB+DeepAligned on four datasets. "KIR" and "LR" mean the known intent ratio and labeled ratio respectively. "Known" denotes the accuracy score on known intents, and "Open" denotes the NMI score on open intents.
ate the fine-grained performance, we calculate the accuracy score (ACC) on known intents and the Normalized Mutual Information (NMI) score on open intents. We use two state-of-the-art methods of open intent detection and discovery (ADB and DeepAligned) as the components of the pipeline framework. The results are shown in Table 1. The pipeline framework successfully connects two modules, and achieves competitive and robust results in different settings. It essentially overcomes the shortcoming of two modules, and uses the first module to identify known intents, the second module to discover open intents.
6 Related Work

Open Intent Detection
Open intent detection has attracted much attention in recent years. It aims to identify known intents while detecting the open intent. The thresholdbased methods use an assigned threshold to detect the open intent. For example, MSP (Hendrycks and Gimpel, 2017) computes the softmax confidence score of each known class and regards the low-confidence samples as open. OpenMax (Bendale and Boult, 2016) uses the Weibull distribution to produce the open class probability. (Shu et al., 2017) replaces the softmax with the sigmoid activation function and fits Gaussian distribution to the outputs for each known class. ODIN (Liang et al., 2018) adopts temperature scaling and input preprocessing technologies to obtain further discriminative probabilities for detecting open intent. The geometrical feature-based methods use the characteristics of intent features to solve this task. For example, DeepUnk (Lin and Xu, 2019) first uses the margin loss to learn the discriminative features. Then, it adopts a density-based algorithm, LOF (Breunig et al., 2000) to discover the anomaly data as the unknown intent. ADB (Zhang et al., 2021a) learns the adaptive decision boundary for each known class among Euclidean space. However, all these methods mentioned above fail to discover fine-grained open classes.

Open Intent Discovery
Open intent discovery leverages clustering methods to help find fine-grained clusters as open intents. Unsupervised clustering methods include traditional partition-based method K-Means (Mac-Queen et al., 1967), hierarchical method Agglomerative Clustering (Gowda and Krishna, 1978), and density-based method (Ester et al., 1996). There are also clustering methods based on deep neural networks, such as Deep Embedded Clustering (DEC) (Xie et al., 2016), joint unsupervised learning (JULE) (Yang et al., 2016), and Deep Clustering Network (DCN) (Yang et al., 2017).
As unsupervised methods may not work well on open settings (Lin et al., 2020), researchers try to leverage some prior knowledge to improve the performance. Some methods use pairwise constraints to guide the clustering process, such as KCL (Hsu et al., 2018), MCL (Hsu et al., 2019) and CDAC+ (Lin et al., 2020). DTC (Han et al., 2019) extends DEC with temporal and ensemble information. DeepAligned (Zhang et al., 2021b) leverages clustering information to obtain aligned targets for self-supervised feature learning. However, all these clustering methods fail to identify the specific known intent classes.

Conclusion
We propose the first open intent recognition platform TEXTOIR, which integrates two complete modules: open intent detection and open intent discovery. It provides toolkits for each module with common interfaces and integrates multiple advanced models and benchmark datasets for the convenience of further research. Additionally, it realizes a pipeline framework to combine the advantages of two modules. The overall framework achieves both identifying known intents and discovering open intents. A series of visualized surfaces help users to manage, train, evaluate, and analyze the performance of different methods.