Unsupervised Keyphrase Extraction by Learning Neural Keyphrase Set Function

,


Introduction
Keyphrase Extraction (KE) is the task of extracting a keyphrase set that provides readers with highlevel information about the key ideas or important topics described in the document.KE methods can be divided into supervised (Sun et al., 2021;Song et al., 2021Song et al., , 2022a) ) or unsupervised (Bennani-Smires et al., 2018;Sun et al., 2020).The former requires large-scale annotated training data and is often domain-specific, whereas unsupervised methods do not need annotated data (Hasan and Ng, 2014).Therefore, in this paper, we focus on Unsupervised Keyphrase Extraction (UKE).
Currently, most UKE methods mainly consist of two components: candidate set generation and keyphrase importance estimation.The former uses heuristic rules to obtain a candidate set for a given document.The latter scores individual phrase from  2022) address it with the pre-trained embeddings (Peters et al., 2018;Devlin et al., 2019).These methods independently estimate the relevance between each phrase in the candidate set and the document as the importance of the phrase from a point-wise perspective, as illustrated in Figure 1(a).Unfortunately, the above point-wise models are essentially phrase-level UKE approaches, and they can not take into account the interactions among all candidate phrases and fails to consider the semantics of the complete candidate set.This makes them more inclined to select keyphrases with highfrequency words while ignoring the coupling of multiple phrases.As a result, the diversity of the selected keyphrases suffers as quantified in our experiments (as shown in Table 6), leading to suboptimal performance.
To address the above issue, we investigate extracting keyphrases globally from a set-wise perspective (as illustrated in Figure 1 tualize the UKE task as a document-set matching problem, as shown in Figure 2. Specifically, the proposed UKE system is based on a document-set matching framework as the set function that measures the relevance between a candidate set and its corresponding document in the semantic space via a siamese-based neural network.The set function is learned by the margin-based triplet loss with orthogonal regularization, effectively capturing similarities of documents and candidate sets.However, it is intractable to exactly search the optimal subset from the candidate set by the set function during the inference because the subset space is exponentially large, and the set function is non-decomposable.To this end, we propose an approximate method whose key idea is to learn a set extractor agent and search for efficient inference.Concretely, after the neural keyphrase set function is well-trained, we use it to calculate the document-set matching score as the reward.Then, we adopt the policy gradient training strategy to train the set extractor agent for extracting the optimal subset with the highest reward from numerous candidate subsets.Ideally, the optimal subset is the closest semantically to the document, as shown in Figure 2. Exhaustive experiments demonstrate the effectiveness of our model SetMatch: it effectively covers the ground-truth keyphrases and obtains higher recall than the traditional heuristics, and it outperforms recent strong UKE baselines.
We summarize our contributions as follows: • Instead of individually scoring each phrase, we formulate the UKE task as a document-set matching problem and propose a novel setwise framework.
• Since the exact search with the document-set matching function, we propose an approximate method by learning a set extractor agent to search the keyphrase set.• Experiments show that it has achieved superior performance compared with the state-ofthe-art UKE baselines on three benchmarks.

Methodology Overview
In this paper, keyphrases are globally selected from a set-wise perspective.More formally, consider a KE system: given the document D, generate its candidate set first.And then, an optimal subset S * ⊆ C is selected from the candidate set C. To achieve this goal, we propose a two-stage model (SetMatch), including candidate set generation and neural keyphrase set function F s .First, candidate set generation aims to generate a candidate set C from the document D with a higher recall to cover more ground-truth keyphrases (Sec 2.1).Second, a neural keyphrase set function F s is learned to estimate the document-set matching score (Sec 2.2), which is used to guide the keyphrase set extractor agent to search an optimal subset (Sec 2.3).

Candidate Set Generation
We adopt various strategies to obtain a candidate set to cover the ground-truth keyphrases fully.These strategies can be mainly divided into two categories, using heuristic rules and pre-trained language models (fine-tuned via keyphrase extraction or generation tasks).The former first tokenize the document, tag the document with part-of-speech tags, and extract candidate phrases based on part-of-speech tags.Next, only keep noun phrases that consist of zero or more adjectives followed by one or multiple nouns.The latter uses neural keyphrase extraction or generation models based on Pre-trained Language Models (PLMs) fine-tuning on other corpora.The details are described in Sec 5.

Neural Keyphrase Set Function
To estimate the importance from a set-wise perspective, we propose a novel neural keyphrase set function F s , which is implemented by a documentset matching framework (Sec 3).With the neural Encoder Decoder keyphrase set function F s , we can score all candidate subsets in the candidate set C and thus find the optimal subset S * depending on these scores.

Keyphrase Set Extractor Agent
However, it is intractable to exactly search an optimal subset by the keyphrase set function F s during the inference because the subset space is exponentially large, and the keyphrase set function F s is non-decomposable.Therefore, we propose a keyphrase set extractor agent to search the optimal subset S * , which is trained by using the keyphrase set function F s as the reward via the policy gradient training strategy to select the optimal subset S * as the keyphrases (Sec 4).Finally, we infer the optimal subset by using the learned set extractor agent rather than F s .
3 Neural Keyphrase Set Function (F s ) There are many ways to judge whether a keyphrase set is good or bad under the document D. One intuitive way is through a matching framework.Therefore, we formulate the neural keyphrase set function F s as a document-set matching task in which the document D and the candidate set C will be matched in a semantic space, as shown in Figure 2.Then, we propose a margin-based triplet loss with multiple perspectives orthogonal regularization L E to optimize the Siamese-BERT Auto-Encoder architecture.The following section details how we instantiate our neural keyphrase set function F s using a simple siamese-based architecture.

Siamese-BERT Auto-Encoder
Inspired by siamese network structure (Bromley et al., 1993), we construct a Siamese-BERT Auto-Encoder architecture to match the document D and the candidate set C. Concretely, our Siamese-BERT Auto-Encoder consists of two BERTs with shared weights, two auto-encoders, and a cosine-similarity layer to predict the document-set score.The overall architecture is shown in Figure 4. Given a batch of candidate sets {C i } M i=1 and documents {D i } M i=1 , we adopt the original BERT (Devlin et al., 2019) to derive the semantically meaningful embeddings as follows, where M indicates the batch size.h C i , h D i ∈ R dr are the i-th candidate set C i and document D i representations within a training batch.Here, we use the vector of the '[CLS]' token from the top BERT layer as the representation of the candidate set C and the document D. Next, we employ two autoencoders (with two encoders ϕ 1 , ϕ 2 and two decoders ϕ 1 , ϕ 2 , as shown in Figure 4) to transfer BERT representations into the latent space as, where ϕ 1 , ϕ 2 ∈ R dr×d l and ϕ 1 , ϕ 2 ∈ R d l ×dr are learnable parameters.Here, let ĥC i , ĥD i ∈ R d l denote the representations of the candidate set C i and the document D i in the latent space, respectively.Finally, their similarity score is measured by

Margin-based Triplet Loss with Orthogonal Regularization
To fine-tune Siamese-BERT Auto-Encoder, we use a margin-based triplet loss with orthogonal regularization to update the weights.We use a simple and intuitive way to generate positive C + i and negative C − i candidate sets.Most existing embedding-based UKE models (Liang et al., 2021;Ding and Luo, 2021) truncate the document to satisfy the encoding requirements of BERT.However, truncating documents will lose a small number of phrases, thus reducing the recall of the candidate set C. Therefore, we generate a positive candidate set C + i (i.e., A † 1 , as illustrated in Table 3) before truncating the document D, and generate a negative candidate set C − i (i.e., A 1 , as illustrated in Table 3) after truncating the document D (more details in Sec 5).Then, the loss L T can be computed as, where δ denotes the margin.The basic idea of L T is to let the positive candidate set with higher recall have a higher document-set matching score than the negative candidate set with lower recall.Furthermore, we propose orthogonal regularization from multiple perspectives, which explicitly encourages each representation within a batch to be different from the other.This is inspired by Bousmalis et al. (2016), who adopts orthogonal regularization to encourage representations across domains to be as distinct as possible.Here, we use the following equations as the orthogonal regularization: where L CC encourages the similarities between all candidate sets under a batch as distinct as possible, L DD encourages the similarities between all documents under a batch as distinct as possible, L CD encourages the similarities between candidate sets and documents under a batch as distinct as possible.Therefore, the final loss function L E of the neural keyphrase set function is re-formulated as, where λ 1 , λ 2 , λ 3 are the balance factors.Here, L D , L C denote the reconstruction loss of our two auto-encoders and are calculated as follows, where ||X || 2 indicates L 2 norm of each element in a matrix X .After the set function F s is welltrained, we fix its parameters and only use it as a non-differential metric to measure a document-set matching score without optimizing parameters.

Keyphrase Set Extractor Agent
As mentioned before, it is intractable to search the optimal subset by the set function precisely.Therefore, we propose a keyphrase set extractor agent to efficiently search an optimal subset.We first exploit a pre-trained BERT model to obtain representations of phrases in the candidate set C and the document D, and then learn a subset sampling network to sample a subset S from the candidate set C based on their representations.After obtaining the candidate subset S, we use the keyphrase set function F s to calculate the document-set matching score F s (S, D) as the reward R(S, D) to optimize the keyphrase set extractor agent for extracting an optimal subset S * via reinforcement learning.

Encoding Network
We employ a pre-trained BERT model to obtain H, h D , the representations of phrases in the candidate set C and the document D, respectively.Here, the representations are obtained by using average pooling on the output of the last BERT layer: where h D denotes the document representation and h pn is the n-th phrase representation in the candidate set C (contains N candidate phrases).

Candidate Subset Searching
To obtain a candidate subset S from the candidate set C, we adopt a self-attention layer as the extractor network to search subsets.We calculate the attention function on all candidate phrases in the candidate set C simultaneously, packed together into a matrix H.We compute the matrix of outputs as follow, where W 1 , W 2 ∈ R dr×dr are the trainable parameters and the REP operator converts the input vector to a R N ×dr matrix by repeating elements up to N rows.Then, the probability distribution can be obtained by, where π θ (S, D) denotes the predicted probability over the candidate set C, θ indicates the trainable parameters of our keyphrase set extractor,

Reinforce-Guided Selection
We exploit an exploitation and exploration training strategy to train the set extractor agent for optimizing its parameters.Here, we adopt the policy gradient algorithm (REINFORCE, (Williams, 1992)) to optimize the policy π θ (S, D).Specifically, in a training iteration, we first use the policy π θ (S, D) to search a candidate subset S from the candidate set C of the document D. Next, the well-trained set function F s computes a document-set matching score F s (S, D) between the candidate subset S and the document D. Finally, we treat the document-set matching score F s (S, D) as the reward R(S, D) to optimize the policy π θ (S, D) with the policy gradient : Inspired by the self-critical training strategy (Rennie et al., 2017), we propose a new teacher-critical training strategy to regularize the reward R(S, D), which uses the top-K predicted keyphrases of the baselines (e.g., JointGL (Liang et al., 2021)) as a reference set Ŝ. Ideally, when maximizing rewards, the teacher-critical training strategy ensures that our model obtains an optimal candidate subset S * better than the reference set Ŝ.Then, we calculate a document-set matching score F s ( Ŝ, D) to regularize the reward R(S, D).Finally, the expected gradient can be approximated by, Generally, the policy π θ (S, D) is gradually optimized through the continuous iteration of the training process to search a better candidate subset S to obtain a higher reward R(S, D).The candidate subset S * with the highest reward R(S * , D) is the final predicted keyphrase set of the document D.

Datasets and Evaluation Metrics
We verify our model on three benchmarks, including the DUC2001 (Wan and Xiao, 2008), Inspec (Hulth, 2003), and SemEval2010 (Kim et al., 2010) datasets.Both keyphrases and their corresponding document are preprocessed via Porter Stemmer1 .The statistics are provided in Table 1.
Following the recent studies (Liang et al., 2021;Ding and Luo, 2021;Zhang et al., 2022), the performance of our model SetMatch and the selected baselines is evaluated using Precision (P), Recall (R), and F1 measure (F1) on the top 5, 10, and 15 ranked phrases.

Implementation Details
Candidate Set Generation.All the models use Stanford CoreNLP Tools2 for tokenizing, part-ofspeech tagging and noun phrase chunking.Three regular expressions are used to extract noun phrases as the candidate set via the python package NLTK3 : A 1 , A 2 , and A 3 , as shown in Table 3.Furthermore, we use two fine-tuned pre-trained language models Embedding-based UKE Model
(B 1 4 and B 2 5 , as shown in Table 3) to generate candidate sets.Here, we take the entire document as input for the truncated document to generate a candidate set (document-level).For the document, without truncating, we leverage fine-tuned PLMs to obtain candidate keyphrases from each sentence in the document individually and combine them as a candidate set (sentence-level).Neural Keyphrase Set Function.Specifically, we set the margin δ for the margin-based triplet loss to 1, λ 1 = λ 2 = λ 3 = 1/3, and the learning rate is set to 5e-5 for both the neural keyphrase set function and the keyphrase set extractor agent.We use a single NVIDIA A4000 GPU for training, the batch size is 2. We train twenty epochs.d r = 768 and d l = 512.We set K to 15 and N to 30.In this paper, we use , and A 1 ∪ B 1 to obtain candidate sets for the Inspec, DUC2001, and SemEval2010 datasets, respectively.Candidate Set Pruning.The subset sampling idea of our subset sampler is more intuitive, while it suffers from combinatorial explosion problems.For example, how could we determine the number of phrases in the candidate set, or should we score all possible subsets?To alleviate these difficulties, we propose a simple candidate pruning strategy, which adopts the recent baseline JointGL (Liang et al., 2021) to prune the candidate set from a point-wise perspective and keep top-ranked N phrases as the candidate set C.

Results and Analysis
Table 2 illustrates the experimental results on the DUC2001, Inspec, and SemEval2010 datasets.
Analysis.The experimental results show that globally extracting keyphrases from a set-wise perspective helps our model outperform recent state-of-theart baselines across the benchmark datasets.The detailed analysis is presented as follows: (1) The keyphrases of the document are usually considered to be disordered and treated as a set.Similar claims have been reported previously in the keyphrase generation literature (Ye et al., 2021;Xie et al., 2022).However, most UKE models score and extract keyphrases from a point-wise perspective, which also rank good keyphrases in order.The impact caused by ranking in order is also visible in the results.It will result in higher scores for F@5 and F@10 but less boost for F@15.Instead, our model globally extracts keyphrases from the setwise perspective.Not only does it focus on modeling the relationship between phrases within the document at a deeper level, but it also ensures that the extracted keyphrase set is semantically closer to its corresponding document in the semantic space.Moreover, the keyphrases predicted by our model are disordered.
(2) Most existing embedding-based UKE models obtain the candidate set and the embeddings of phrases after truncating the document.Notable that this is done for two main reasons.First, it benefits to calculate the document-phrase matching similarity.Second, it is subject to the limitation of the input length by the pre-trained language model.However, truncating documents reduces the quality of candidate sets, reducing the performance of keyphrase extraction.Our document-set matching framework alleviates this problem, allowing our model to consider all phrases in the original document to form a candidate set.From the results, the improvement of our model on the DUC2001  and SemEval2010 datasets (with long documents) is better than that on the Inspec dataset (with short documents).Compared with the underlined results in Table 2, our model has achieved 10.65%, 7.44%, and 11.32% improvement in F@5, F@10, and F@15 on the SemEval2010 dataset.

Ablation Study
Effect of generating candidate sets with different strategies.The details of the candidate generation strategies and the associated performance are reported in Table 3.For easy description, A * denotes A 1 , A 2 , A 3 and B * denotes B 1 , B 2 .We summarize the detailed analysis as follows: (1) The ensemble candidate set generation strategy obtains higher recall than using A * or B * .
(2) A * obtain more stable and higher recall than B * in most cases on three benchmark datasets.
(3) B * get higher recall scores on the long document dataset, such as the SemEval2010 dataset.
(4) Intuitively, the longer the document, the more the candidate loss is caused by truncation.

Effect of training with different loss functions.
As illustrated in Table 4, our ablation study considers the effect of the reconstruction loss (L C +L D ), the margin-based triplet loss (L T ), and the orthogonal regularization (L CC + L DD + L CD ) on the SetMatch Acc F1@5 F1@10 F1@15  SemEval2010 dataset.To verify the effectiveness of the neural keyphrase set function directly, we propose a simple method to construct the pseudo label where S r i is the ground-truth keyphrase set of the i-th document D i .Here, we calculate the score(•) via F1@M, which takes all the phrases in the candidate set C to evaluate F1 score.After obtaining pseudo labels, we use the keyphrase set function to predict scores following Eq. 12 instead of F1@M, verifying the effectiveness of our keyphrase set function by comparing the predicted scores with pseudo labels for acquiring accuracy.From the SetMatch Acc F1@5 F1@10 F1@15 Positive : A † 1 , N egative : A1 0.96 14.44 20.79 24.18 Positive : A † 2 , N egative : A2 0.91 14.10 19.69 22.09 Positive : A † 3 , N egative : A3 0.93 14.32 20.08 22.17 results in Table 4, we can find that L T can distinguish positive and negative samples well, and the orthogonal regularization significantly improves the performance.Effect of different training samples.We adopt different positive and negative samples to train the keyphrase set function F s , as illustrated in Table 5.
The best results are obtained by using A † 1 and A 1 .Effect of the teacher-critical training strategy.
To verify the effectiveness of the proposed teachercritical training strategy, we adopt a series of fixed values to regularize the reward R(S, D). Figure 5 shows the results under different values of the regularization on the SemEval2010 dataset.The best results are achieved by using our teacher-critical training strategy.However, dropping the regularization (i.e., the fixed value is set to 0) of the reward R(S, D) will significantly damage the final performance.Moreover, our model can be treated as an optimization model for the SOTA UKE baselines by adopting the teacher-critical training strategy.

Diversity Evaluation
To evaluate the diversity, we follow the previous studies (Bahuleyan and Asri, 2020) and define two evaluation metrics: (1) Duplicate% = (1 − # Unique Tokens # Extracted Tokens ) × 100 (2) EditDist6 : String matching can be carried out at the character level.Through this evaluation

Model
Duplicate%@15 EditDist@15 Inspec JointGL (Liang et al., 2021) 34 As shown in Table 6, the results demonstrate that globally extracting keyphrases from a set-wise perspective can avoid the repeated selection of phrases with high-frequency words and consider the coupling of multiple keyphrases.

Case Study
To further provide an intuitive understanding of how our model benefits from a set-wise perspective, we present an example in Table 7.In the given an example, "trajectories" and "feature" are highfrequency words in the document.Therefore, if keyphrases are extracted individually from a pointwise perspective, the phrases containing these two words will get a higher score and be extracted as the keyphrases.However, from a set-wise perspective, it will alleviate the above issue and extract diverse keyphrases.These results further demonstrate that it is effective to extract keyphrases via the document-set matching framework.
Recently, embedding-based methods (Bennani-Smires et al., 2018;Saxena et al., 2020;Sun et al., 2020;Liang et al., 2021;Ding and Luo, 2021;Song et al., 2022b;Zhang et al., 2022), benefiting from the development of pre-trained embeddings (Mikolov et al., 2013;Peters et al., 2018;Devlin et al., 2019), have achieved significant performance.Bennani-Smires et al. (2018) ranks and extracts phrases by estimating the similarities between the embeddings of phrases and the document.Sun et al. (2020) improves embeddings via the pre-trained language model (i.e., ELMo (Peters et al., 2018)) instead of static embeddings (i.e., Word2Vec (Mikolov et al., 2013)).Ding and Luo (2021) models the phrase-document relevance from different granularities via attention weights of the pre-trained language model BERT.Liang et al. (2021) enhances the phrase-document relevance with a boundary-aware phrase centrality to score each phrase in the candidate set individually.Zhang et al. (2022) leverages a masking strategy and ranks candidates by the textual similarity between embeddings of the source document and the masked document.Unlike existing UKE models, we propose to extract keyphrases from a set perspective by learning a neural keyphrase set function, which globally extracts a keyphrase set from the candidate set of the document.

Conclusion and Future Work
We formulate the unsupervised keyphrase extraction task as a document-set matching problem and propose a novel set-wise framework to match the document and candidate subsets sampled in the candidate set.It is intractable to exactly search the optimal subset by the document-set matching function, and we thereby propose an approximate algorithm for efficient search which learns a keyphrase set extractor agent via reinforcement learning.Extensive experimental results show SetMatch outperforms the current state-of-the-art unsupervised keyphrase extraction baselines on three benchmark keyphrase extraction datasets, which demonstrates the effectiveness of our proposed paradigm.
Lately, the emergence of Large Language Models (LLMs) has garnered significant attention from the computational linguistics community.For future research, exploring effectively utilizing LLMs to generate candidates and rank candidates to extract keyphrases may be an exciting and valuable direction (i.e., exploring LLM-based UKE).

Acknowledgments
We thank the three anonymous reviewers for carefully reading our paper and their insightful comments and suggestions.This work was partly supported by the Fundamental Research Funds for the Central Universities (2019JBZ110); the National Natural Science Foundation of China under Grant 62176020; the National Key Research and Development Program (2020AAA0106800); the Beijing Natural Science Foundation under Grant L211016; CAAI-Huawei MindSpore Open Fund; and Chinese Academy of Sciences (OEIP-O-202004).

Limitations
In this paper, we propose a novel set-wise framework to extract keyphrases globally.To verify the effectiveness of the new framework, we design simple yield effective neural networks for both the neural keyphrase set function and the keyphrase set extractor agent modules.In general, a complex neural network should yield better performance.Moreover, for the sake of fairness, our model adopts the same pre-trained language model (i.e., BERT) as the recent state-of-the-art baselines (Liang et al., 2021;Ding and Luo, 2021;Zhang et al., 2022).Actually, other pre-trained language models can be applied to our model, such as RoBERTa (Liu et al., 2019).These pre-trained language models may yield better results, which also demonstrates that there is much room for improvement in our proposed framework.Therefore, we believe the power of this set-wise framework has not been fully exploited.In the future, more forms of document-set matching models can be explored to instantiate the set-wise framework.

5
B3. Did you discuss if your use of existing artifact(s) was consistent with their intended use, provided that it was specified?For the artifacts you create, do you specify intended use and whether that is compatible with the original access conditions (in particular, derivatives of data accessed for research purposes should not be used outside of research contexts)?5 B4.Did you discuss the steps taken to check whether the data that was collected / used contains any information that names or uniquely identifies individual people or offensive content, and the steps taken to protect / anonymize it?5 B5.Did you provide documentation of the artifacts, e.g., coverage of domains, languages, and linguistic phenomena, demographic groups represented, etc.? 5 B6.Did you report relevant statistics like the number of examples, details of train / test / dev splits, etc. for the data that you used / created?Even for commonly-used benchmark datasets, include the number of examples in train / validation / test splits, as these provide necessary context for a reader to understand experimental results.For example, small differences in accuracy on large test sets may be significant, while on small test sets they may not be.5 C Did you run computational experiments?5 C1.Did you report the number of parameters in the models used, the total computational budget (e.g., GPU hours), and computing infrastructure used? 5

Figure 1 :
Figure 1: Illustration of extracting keyphrases from different perspectives.F s (•) denotes a scoring function measuring the relevance between a candidate phrase p i (a) or a candidate subset S i (b) and the document D.

Figure 2 :
Figure2: The document-set matching framework.Intuitively, better candidate subset should be semantically closer to the document in the semantic space, while the optimal subset should be the closest.

Figure 3 :
Figure 3: The overall pipeline of our model.

Figure 4 :
Figure 4: The overall architecture of our document-set matching module.

Table 5 :Figure 5 :
Figure 5: Results of comparing the teacher-critical training strategy with a series of fixed values.
you describe the limitations of your work? 8 A2.Did you discuss any potential risks of your work? 8 A3.Do the abstract and introduction summarize the paper's main claims? 1 A4.Have you used AI writing assistants when working on this paper?Left blank.B Did you use or create scientific artifacts?5 B1.Did you cite the creators of artifacts you used? 5 B2.Did you discuss the license or terms for use and / or distribution of any artifacts?

Table 1 :
The statistics of several benchmarks.#Doc. is the number of documents.Type indicates the length of documents.Avg.#Words is the average number of words for documents.Present Keyphrases in Truncated Doc.(512) and in Original Doc.indicate the ratio of keyphrases, which present in the truncated and original documents.is a fully-connected layer, and p is the candidate phrase in the candidate set C. To obtain the candidate subset, we rank phrases in the candidate set C with the predicted probability π θ (S, D) and extract top-ranked K(K < N ) keyphrases as a candidate subset S.

Table 3 :
Results of the different candidate set generation strategies on three benchmark datasets.The best results are bold, and the second results are underlined.Here, R@M compares all the keyphrases extracted by the strategy with the ground-truth keyphrases, which means it considers all phrases in the candidate set.Specifically, for Inspec, A 1 ∪ B 2 = 0.6894 (102) , where 0.6894 indicates the value of R@M, and 102 indicates the average number of keyphrases in candidate sets.

Table 4 :
Performance of training the neural keyphrase set function F s by using different loss functions.The best results are in bold.

Table 6 :
Diversity evaluation on three benchmark.The lower value is better in diversity.Note that Porter Stemming is applied before evaluation.metric, we can calculate the pairwise Levenshtein Distance between extracted keyphrases.