APIRecX: Cross-Library API Recommendation via Pre-Trained Language Model

For programmers, learning the usage of APIs (Application Programming Interfaces) of a software library is important yet difficult. API recommendation tools can help developers use APIs by recommending which APIs to be used next given the APIs that have been written. Traditionally, language models such as N-gram are applied to API recommendation. However, because the software libraries keep changing and new libraries keep emerging, new APIs are common. These new APIs can be seen as OOV (out of vocabulary) words and cannot be handled well by existing API recommendation approaches due to the lack of training data. In this paper, we propose APIRecX, the first cross-library API recommendation approach, which uses BPE to split each API call in each API sequence and pre-trains a GPT based language model. It then recommends APIs by fine-tuning the pre-trained model. APIRecX can migrate the knowledge of existing libraries to a new library, and can recommend APIs that are previously regarded as OOV. We evaluate APIRecX on six libraries and the results confirm its effectiveness by comparing with two typical API recommendation approaches.


Introduction
Application Programming Interface (API) is an integral part of software libraries. Being familiar with APIs could help improve programming productivity. However, a library tends to contain a large number of APIs and there could be complex dependencies among APIs, and thus understanding all APIs in a library is very challenging, especially for new developers. To facilitate correct and efficient usage of APIs during programming, many API recommendation approaches (Zhong et al., 2009;Nguyen et al., 2016;Xie et al., 2019;Bruch et al., 2009; have been proposed. More specifically, API recommendation * * Junjie Chen is the corresponding author. aims to automatically recommend a correct API call at the current programming location based on its preceding part of code information. As an example, Listing 1 shows a Java code snippet about opening a text file. Assuming a programmer forgets what to write in Line 6. API recommendation tool can help the programmer by prompting the most likely API call to be used next. In this case, printStackT race() will be returned. The API recommendation tools do so by learning API usage pattern from a large code corpus. Some tools (Nguyen et al., 2016;Nguyen and Nguyen, 2015) use probabilistic models to learn API usage pattern , while others (Zhong et al., 2009;Wang et al., 2013) use data mining methods to find API usage patterns . Recently, deep learning based language models are proposed to model the API sequences and have obtained promising results in recommending APIs (Raychev et al., 2014;Yan et al., 2018;White et al., 2015;Nguyen and Nguyen, 2015).
However, the existing API recommendation tools only focus on improving the performance of API recommendation when API usage data are sufficient (i.e., the usage data of the APIs to be recommended are sufficient in training data). That is, they mostly ignored the OOV (out of vocabulary) problem, which could have negative impact on the performance of API recommendation. More specifically, when some APIs are unseen in training data, these approaches cannot recommend them correctly. The OOV problem could be more serious for a new library, since it is very difficult to collect sufficient API usage data.
To conduct API recommendation for new libraries, cross-library API recommendation is a potentially feasible solution, which aims to recommend APIs in new libraries based on the usage data of APIs in other libraries, but it is still an open challenge due to the inherent OOV problem. For example, as shown in Listing 2, we may rarely (or even never) see SQLException.printStackTrace() in the training set, but the usage of Exception is very common in the training set and the usage of SQLException and Exception are similar. So if we use a word segmentation algorithm to split SQLException.printStackTrace() into the sequence: SQL-Exception-.-print-StackTrace(), we can use the Exception usage pattern learned during the training process to predict the printStackTrace() method and finally synthesize SQLException.printStackTrace() as the recommendation result. § ¤ e . p r i n t S t a c k T r a c e ( ) ; 10 } 11 r e t u r n c o n n e c t i o n ; 12 } ¦ ¥

Listing 2: An OOV Example in API Recommendation
To achieve the goal of cross-library API recommendation, we draw lessons from the area of text generation in relieving the OOV problem (Sennrich et al., 2016;Hermann et al., 2021). More specifically, we design a framework of cross-library API recommendation, called APIRecX, which consists of three main components, i.e., API segmentation, subword language model building, and API synthesis for recommendation. Since the OOV problem at the API level hampers cross-library API recommendation, APIRecX first incorporates BPE (Byte Pair Encoding) (Provilkov et al., 2020;Sennrich et al., 2016), one of the most widely-used word segmentation methods in text generation, to split each API call into a sequence of subwords. That is, the OOV problem at the API level could be largely relieved at the subword level. Based on a large number of subword data, APIRecX then adopts the "pre-training&fine-tuning" mechanism to build a GPT-based(Generative Pre-Training) pretrained language model, which can recommend a subword in each prediction. Since the recommendation process is conducted at the subword level, it is necessary to compose a complete API call for recommendation based on predicted subwords. Here, APIRecX incorporates beam search for API synthesis.
To evaluate the performance of APIRecX, we conducted an extensive study based on 1,711 Java projects from GitHub involving six libraries in three domains as subjects for mimicking new libraries in the scenario of cross-library API recommendation, and over 14,000 GitHub Java projects that do not involve the former six libraries as training corpus. By comparing with two typical API recommendation approaches, i.e., LSTM-based language model (Yan et al., 2018;White et al., 2015) and N-gram-based language model (Raychev et al., 2014;Karampatsis and Sutton, 2019;Hindle et al., 2012), our experimental results demonstrate the effectiveness of APIRecX for cross-library API recommendation in terms of recommendation accuracy.
To sum up, this work makes the following major contributions: • We propose the first framework for crosslibrary API recommendation, consisting of BPE-based API segmentation, subword language model building, and beam-search based API synthesis.
• We are the first to build a GPT-based language model in the area of API recommendation, which is more effective than the existing language models.
• We conduct an extensive study to evaluate our proposed approach, demonstrating its effectiveness in the scenario of cross-library API recommendation.

Approach
In the paper, we propose APIRecX, the first approach for cross-library API recommendation. With APIRecX, we can recommend APIs in some libraries (especially new libraries) by learning from a large amount of API usage data of some other libraries.

Overview
Achieving the goal of cross-library API recommendation is challenging.
• First, different libraries tend to not contain APIs with the same names, and thus it is hard to adopt existing approaches to recommend APIs that are not seen in training data. That is, the first challenge is due to the OOV problem at the API level. To overcome it, APIRecX aims to recommend APIs at the subword level through API segmentation. The insight is that an API call usually consists of a set of relatively commonly-used subwords such as Exception, print, etc. Therefore, the OOV problem at the API level can be largely relieved at the subword level.
• Second, APIRecX recommends each subword in turn and then composes a complete API call for recommendation based on predicted subwords. That means that an API call can be correctly recommended only if all the subwords in the API call are recommended correctly, which largely aggravates the recommendation difficulty. To relieve the inaccuracy of API recommendation caused by inaccurate subword prediction, APIRecX incorporates beam search to enlarge the search space of API synthesis instead of directly recommending an API call composed by Top-1 subword in each prediction.
With the above two insights, we design a novel GPT-based method in APIRecX to build a subword language model. Here, APIRecX first pre-trains a subword language model based on a large number of API usage data of other libraries in an offline process. When new libraries are released, APIRecX then directly fine-tunes the pre-trained model after collecting a certain amount of API usage data of new libraries, which is much more efficient than retraining based on all API usage data (i.e., pre-training data and fine-tuning data). Also, to make APIRecX a light-weight approach, APIRecX does not build complex data-flow and control-flow graphs, but directly represents a method as an API sequence following the existing work (Gu et al., 2017;Yan et al., 2018;Nguyen et al., 2017). The overview of APIRecX is shown in Figure 1.

BPE-based API Segmentation
APIRecX extracts API sequences following the practice in the existing work (Gu et al., 2017), which extracts all API calls (identifier&arguments, e.g.DriverManager.getConnection(String)), and control statements with API call in a method to form an API sequence. Here, all variables in API sequences are replaced with their types. For example, for an API call o.m() where o is an instance of a class C, APIRecX adds C.m to the API sequence.
Although API names tend to be unique, they usually consists of a set of relatively commonlyused subwords. That is, different API names may include common subwords, and thus the OOV problem at the API level could be largely relieved at the subword level. With this insight, APIRecX splits an API call in an API sequence into a sequence of subwords and conducts follow-up learning and prediction at the subword level, and finally composes a complete API call for recommendation based on predicted subwords. In this way, it is possible to compose an unseen API call in training data with subwords, which makes cross-library API recommendation become feasible.
Here, APIRecX adopts BPE (Provilkov et al., 2020;Sennrich et al., 2016;Devlin et al., 2018), one of the most widely-used word segmentation methods in text generation, for splitting an API call to subwords. The reason why choosing BPE is that it achieves a good balance between effectiveness and efficiency. More specifically, compared with character segmentation (Gao et al., 2020) ,whitespace segmentation (Tezcan et al., 2020;Mikolov et al., 2013),and CamelCase segmentation,BPE is more effective, since character segmentation is too fine-grained and thus leads to much semantic loss while whitespace segmentation is too coarsegrained for API calls and thus cannot effectively relieve the OOV problem.Although CamelCase segmentation can achieve a relatively appropriate segmentation granularity, compared with BPE,it has a larger granularity, which will cause more OOV words.By taking the domain of Swing as an example, there are 61.9% common subwords between training and test data achieved by BPE, while there are only 50.9% common subwords achieved by CamelCase.Compared with more advanced methods (e.g., WordPiece (Devlin et al., 2018) and ULM (Chen et al., 2005), BPE is more efficient but not much less effective, since these methods need to build language models during word segmentation while BPE is based on frequency. Besides, APIRecX adds a special subword (/t) to mark the end of each API call, which helps APIRecX determine the termination of subword recommendation for an API call. Through this step, APIRecX obtains a large amount of API usage data at the subword level. As an example, for the code in Listing 1, the API sequence after BPE-based segmentation is:

Building a Subword Language Model
To build a subword language model, APIRecX adopts the "pre-training & fine-tuning" mechanism as presented above. That is, APIRecX first pretrains a subword language model based on a large amount of subword data that do not involve APIs of new libraries, and then fine-tunes the pre-trained model by including a small amount of subword data involving the APIs of the library to be recommended. Besides the efficiency benefit presented above, fine-tuning has been demonstrated to be more effective than the strategy of direct training based on the mixed data of pre-training data and fine-tuning data (Mao et al., 2015), since the volume of API usage data of new libraries is significantly smaller than that of other libraries, leading to very difficult to learn usage patterns of the APIs of new libraries via the latter strategy.
In APIRecX, we design a GPT-based subword language modeling building method. GPT first maps an API subword sequence S =a 1 , ..., a t into a vector matrix through the embedding layer Emb where t represents the total number of subword in the API subword sequence , and then we can get the embedding matrix H 0 of the API subword sequence after adding the position information through the position embedding matrix W p .
Then, GPT inputs the obtained embedding matrix into the decoder block of the transformer for calculation. where x represents the order number of Transformer layers, and the vector matrix H n outputted by the last layer of decoder block represents the attention weight for each subword in this sequence. Then, H n is multiplied by the transpose of embedding layer matrix, and normalized by softmax to obtain P (S) which represents the probabilities of all subwords in the vocabulary at each position in the sequence.
In the training phase, we calculate the loss between ground truth and P (S) through cross-entropy, and optimize GPT through the Adam optimization algorithm.

Beam-search based API Synthesis
With a subword language model, APIRecX recommends a subword in each prediction based on a sequence of subwords before the current position to be predicted. Given that an API call to be recommended is denoted as A m = {s 1 m , s 2 m , . . . , s nm m } where s j m refers to the j th subword in A m and n m refers to the number of subwords in A m , API calls before A m are denoted as if the prediction is correct. That is, the currently predicted subword is used to predict subsequent subwords.
When the prediction result ends with (/t), APIRecX outputs the chain of predicted subwords as the API call for recommendation. For example, in listing 2, when the developer enters e. on line 6, APIRecX . Then APIRecX predicts the next subword based on the input. When APIRecX predicts a subword ending with (/t) such as {print, StackT race()(/t)}, it will merge predicted subwords with S 1 5 and S 2 5 and return the result to the developer.
However, subword prediction aggregates the difficulty of API recommendation, since it is hard to guarantee the accurate prediction of each subword in an API call. Especially, when a wrong subword is predicted in a certain position, the predictions of all the subsequent subwords could be also affected, since the wrong subword will be used to predict subsequent subwords. Actually, each subword is assigned as a probability in each prediction. By considering all the subwords in each prediction and using each subword for subsequent predictions, the correct chain of subwords (used for composing a complete API call) cannot be missing, but exploring such enormous combination space is unaffordable. Therefore, it is still challenging to recommend a complete API call based on subword-level prediction.
To achieve the balance between the accuracy of API recommendation and the efficiency, APIRecX adopts widely-used beam search (Freitag and Al-Onaizan, 2017;Shu and Nakayama, 2018;Huang et al., 2017). More specifically, beam search considers Top-K subwords (K refers to beam size) in each prediction rather than only Top-1 subword or all the subwords. For each of Top-K subwords in a prediction, it then produces Top-K subwords and obtains K 2 chains of subwords, and then preserves Top-K chains according to their chain probabilities for the next prediction. Following the existing work (Shu and Nakayama, 2018;Huang et al., 2017;Freitag and Al-Onaizan, 2017;Karampatsis et al., 2020), we use Formula 3 to calculate the chain probability of a chain of subwords: where, p(w j m ) (which is short for p(w j m |s 1 1 , . . . , s n 1 1 , . . . , s n m−1 m−1 , w 1 m , . . . , w j−1 m )) is the probability of the j th subword in the chain of (w 1 m , . . . , w i m ) predicted by the subword model. To relieve the effectiveness problem caused by the monotonicity of traditional beam search, APIRecX preserves the memory of poor-quality incomplete chains produced during the process of beam search following the existing work in text generation (Shu and Nakayama, 2018). More specifically, APIRecX constructs a candidate pool that stores the remaining incomplete chains except Top-K chains among k 2 chains produced in each prediction. When k 2 chains produced based on Top-K chains selected from the last prediction have smaller chain probabilities than those of chains in the candidate pool, APIRecX chooses Top-K chains among the k 2 chains produced in the current prediction and all the chains in the candidate pool rather than only the current k 2 chains. In this way, APIRecX has a chance to make up wrong choice in previous predictions. Besides, APIRecX improves the condition of terminating the beam search process following the existing work (Huang et al., 2017) in text generation, i.e., the searching stops until the smallest chain probability among all the produced complete chains is larger than the largest chain probability among all incomplete chains (including incomplete chains in both the candidate pool and current Top-K chains).    Table 1 shows the information about the three experiments. where Column "#API" is the number of APIs in the corresponding domain libraries, Column "#Project" is the number of Java projects that are collected from GitHub and use the APIs in the domain libraries, and Column "#Sequence" is the number of API sequences that are extracted from the collected projects.
Besides, we adopted the corpus provided by the existing work (Allamanis and Sutton, 2013) for pretraining. The corpus has over 14,000 Java projects from GitHub after removing the projects involving the above three domains. From these projects, we extracted over 5,000,000 API sequences as pretraining data. Table 2 shows the information about the pre-train corpus. where Column "#Projects" is the number of Java projects in pre-train corpus, Column "#LOC" is the total number of lines of code, Column "#Methods" is the total number of java methods, and Column "#Sequence" is the number of API sequences that are extracted from this corpus.

Selecting test and fine-tune data
We used 10 projects (splitting domain projects into 10 groups and then selecting the one with the largest number of domain API calls in each group) as test projects, and extract API call sequences from them.For each sequence of API calls, we produced a set of API call sequences, each of which contain a "hole", as test data. Specifically, we produced them by digging a "hole" from the second API call in the sequence in turn respectively. Then, for each API call sequence with a "hole", we used the sequence of API calls before the "hole" as input for predicting the API call in the "hole". After selecting the test data, we sample a certain amount of data from the remained data at 5 different sampling ratios which are 0.2%, 1%, 10%, 50%, and 100% as fine-tune data.Then we use these sampled data to fine-tune the pre-trained model following the fine-tuning process presented in Section 2.3

Baselines
We adopted traditional LSTM-based API recommendation approach (Yan et al., 2018;White et al., 2015;Chen et al., 2019) and Ngram based API recommendation approach (Raychev et al., 2014;Karampatsis and Sutton, 2019) for comparison in order to quantitatively investigate the superiority of APIRecX over traditional API recommendation approaches. We refer to the parameter settings in these two works (Yan et al., 2018;Raychev et al., 2014) to train baseline tools on the data we collected, and the specific parameter settings are shown in Table 6.

Parameters
The parameters comprise the model training parameters and the beam search parameters in the API recommendation process. Table 6 lists all the parameters of APIRecX and baselines. The structure of original GPT contains a 12-layer transformer decoder block with 12-head attention, containing nearly 100 million parameters, which requires an extremely huge amount of data to support training. However, compared with collecting text data, it is harder to collect such a huge amount of API usage data to support training such a complicated model, and thus we tailored the structure of the original GPT to match with the scale of our training data. Specifically, our tailored GPT uses a 6layer transformer decoder block with 8-head attention. Besides, GPT handles fixed-length sequences, thus we set the subword-sequence length to be 512. In our context, the fixed-length sequence refers to the fixed-length subword sequence processed from an API call sequence. For the API call sequences in our dataset, the average length is 41, the largest length is 2,280, and the percentage of subword sequences that are longer than 512 is only 0.4%. Moreover, the longer the sequence is, the more difficult it is to model. Therefore, our setting (512) could reach a good trade-off following the existing study (Devlin et al., 2018). The baseline model parameters were set according to the previous work (Yan et al., 2018;Raychev et al., 2014). We trained the APIRecX for 15 epochs in the pretraining stage, and then we adopted the early stop strategy to terminate the fine-tuning process in the fine-tuning stage. For baseline approaches, we adopted early stop strategy to terminate the training process according to the previous work.
Beam search process contains two parameters: beam size and max iteration. Beam size represents the width of beam search and max iteration represents the maximum search epoch. More details of parameters setting will be shown in the Appendix.

Evaluation Metric
To evaluate the performance of APIRecX, we adopted Top-N accuracy following the existing work on API recommendation (Xie et al., 2019;Nguyen et al., 2016;Nguyen and Nguyen, 2015). Each API recommendation approach can produce a ranking list of API calls for recommendation. Top-N accuracy measures the percentage of the cases that the correct API call is included in Top-N results among all the locations in the test set, and higher Top-N accuracy indicates better performance. Following the existing work (Nguyen et al., 2016;Nguyen and Nguyen, 2015;Xie et al., 2019;Yan et al., 2018), we set N to be 1, 5, and 10 respectively. Note that We focus on the recommendation of domain APIs, so we only report the accuracy of Top-N recommendation of domain APIs. Table 3 presents the comparison results between APIRecX and baselines under five sampling ratios in three domains, respectively.

Overall effectiveness
From this table, APIRecX performs better than the two baselines under all the studied sampling ratios in all the three domains in terms of all the metrics. For example, under the sampling ratio of 0.2% in the domain of IO, APIRecX has achieved 52.9% Top-1 accuracy while the two baselines are only 30.6% and 16.5%. The improvements are 72.87% and 220.61%, respectively. We also performed a Wilcoxon rank sum test to investigate whether our approach can significantly outperform LSTM and N-gram across all the domains respectively. The results show that all the p-values are smaller than 0.004 (<0.05) regardless of Top-1/Top-5/Top-10 accuracy, demonstrating the effectiveness of our approach in statistics.
We then analyzed why APIRecX performs well as shown in Table 4. In this table, the fast three rows present the percentage that training data cover domain APIs in the test set, the percentage that training data cover subwords from domain APIs in the test set, percentage of unseen APIs in the correct recommendation result, and the last rows present the number of API call types that successfully recommended by APIRecX under the sampling ratio of 0.2%.
From Table 4, under the sampling ratio of 0.2%, the API coverage is small (10.9∼49.3%), only 25.5% APIs are covered by training data on average, but the subword coverage is large (61.9∼89.3%) and the average subword coverage rate reached 77.7%, indicating the power of API segmentation to handle the OOV problem. Indeed, APIRecX is able to recommend unseen APIs in both pre-training and fine-tuning data. For example, Among the APIs correctly recommended by the APIRecX, an average of 28.1%, 131.3 types is from the unseen APIs,demonstrating its ability for cross-library API recommendation.

Effectiveness of Beam Search
We compare our beam search strategy in APIRecX and the traditional beam search (Freitag and Al-Onaizan, 2017;Shu and Nakayama, 2018) under different beam sizes. Here, we use the JDBC domain with and the sampling ratio of 10% as the representative, whose comparison results are shown in Table 5. From this table, our used beam search performs better than traditional beam search under all the studied beam sizes in terms of all the metrics, demonstrating the contribution of the improved beam search strategy. In the meanwhile, its contribution becomes more obvious in Top-5 accuracy and Top-10 accuracy than Top-1 accuracy because the rescued chains of subwords by the improved beam search are difficult to have larger chain probabilities than Top-1 chain due to the small probability of certain subword prediction. More specifically, the probability of a complete API call (e.g., printStackTrace() in Line-6 of Listing-1) is the product of the probabilities of a chain of subwords (e.g., print, StackTrace, ()). Although the candidate pool storage of the improved beam search can relieve the effectiveness problem caused by the monotonicity of traditional beam search through preserving the memory of poor-quality incomplete chains produced during the beam-search process, the small probabilities of poor-quality incomplete Sample Approach  JDBC  Swing  IO  Top-1 Top-5 Top-10 Top-1 Top-5 Top-10 Top-1 Top-5 Top-    chains could lead to the small probability of the corresponding complete API call, making it hard to be ranked as Top-1. Taking Line-6 in Listing-1 as an example, if "StackTrace" has a small probability, its small probability could make the probability of the complete API call small, causing it hard to be ranked as Top-1. Therefore, the improved beam search has less apparent improvement in terms of Top-1 accuracy. Also, APIRecX performs stably under different beam sizes.
4 Related work

API recommendation
In the literature, some statistical learning based (Nguyen and Nguyen, 2015;Liu et al., 2018;Raychev et al., 2014;Xie et al., 2019) and pattern mining based API recommendation approaches (Zhong et al., 2009;Wang et al., 2013;Fowkes and Sutton, 2016;Xie et al., 2019) have been proposed without dealing with the OOV problem, and thus all of them cannot be effective in the scenario of cross-library API recommendation. For example, Xie et al. (2019) proposed HiRec, which improves pattern-mining based approaches by utilizing the hidden information of project-specific code via call graph in mining API usage patterns. Nguyen and Nguyen (2015) designed a graph-based statistical language model by representing source code as graphs for API recommendation. Different from them, APIRecX is the first approach for crosslibrary API recommendation by handling the OOV problem via GPT-based pre-trained subword language model.

Pre-trained models across languages
Our approach is inspired by pre-training in the multilingual scenario (Chi et al., 2020;Huang et al., 2019;Yang et al., 2020aYang et al., ,b, 2019 (Garneau et al., 2020) to obtain the cross-lingual N-gram vector. The translation table between the two languages is inferred from the similarity of the N-gram vectors of the two languages. Different from them, our work targets the problem of API recommendation rather than crosslingual problems, which have different characteristics, and APIRecX builds a GPT-based subword language model for API recommendation. Code-BERT (Feng et al., 2020) gets a general language model about programming language by pre-trained on six different programming languages, and can be applied to different downstream tasks,It seems that codebert can be our baseline but the reason why not use CodeBERT as the baseline for comparison is that it needs two-way information and we regard API recommendation as a one-way text generation task. When developers use API, they usually write API calls sequentially (forward) and the task of API recemmendation is to predict the future API calls, there is no reverse information (backward) in practice. Therefore, CodeBERT cannot be applied to our problem.

Conclusions
We propose the first approach APIRecX for crosslibrary API recommendation, which can automatically recommend API calls for new libraries. APIRecX first splits each API call into a sequence of subwords to relieve the OOV problem at the API level. It then pre-trains a GPT-based subword language model based on a large number of API usage data from other libraries. By finetuning the pre-trained model with a sample of API usage data of new libraries, APIRecX conducts subword prediction and incorporates beam search to compose a complete API call for recommendation. We conduct an extensive study based on six libraries of three domains for mimicking new libraries and 14,000 GitHub Java projects for pre-training, demonstrating the effectiveness of APIRecX. However, our work also has certain limitation, which is the generalization of our results and findings. Although we invested significant time and effort to prepare datasets, conducted experiments and analyzed results, our experiments involved only one program language with three domains. The performance of our neural architecture, and especially the findings on transfer learning, could be different with other programming languages or libraries. In the future, we will try to get rid of this limitation by applying our approach to more languages/libraries.
The source code of APIRecX and experimental data can be found in https://github.com/ yuningkang/APIRecX.  "Scratch" means training APIRecX from scratch using only three different proportions of fine-tuning data. As shown in Table 8, the "pre-train&finetune" mechanism is better than the other two onestep strategy at three sampling ratios, and proves superiority under low sampling ratios.

C Beam Size evaluation
We evaluate the effectiveness of different beam size under three different sampling ratios of JDBC domain to find the suitable beam size. Table 7 lists the average recommendation accuracy rates achieved in 5 different beam sizes under three different sampling ratios in JDBC domain. Table 7 shows that, as the beam size increases, the duration and the accuracy both increases. After the beam size reaches 20, the accuracy increases rather slowly and remains basically unchanged. To balance the performance and efficiency of APIRecX, we set beam size to be 20 as the parameter of other comparative experiments.