Keyphrase Generation Beyond the Boundaries of Title and Abstract

Keyphrase generation aims at generating important phrases (keyphrases) that best describe a given document. In scholarly domains, current approaches have largely used only the title and abstract of the articles to generate keyphrases. In this paper, we comprehensively explore whether the integration of additional information from the full text of a given article or from semantically similar articles can be helpful for a neural keyphrase generation model or not. We discover that adding sentences from the full text, particularly in the form of the extractive summary of the article can significantly improve the generation of both types of keyphrases that are either present or absent from the text. Experimental results with three widely used models for keyphrase generation along with one of the latest transformer models suitable for longer documents, Longformer Encoder-Decoder (LED) validate the observation. We also present a new large-scale scholarly dataset FullTextKP for keyphrase generation. Unlike prior large-scale datasets, FullTextKP includes the full text of the articles along with the title and abstract. We release the source code at https://github.com/kgarg8/FullTextKP.


Introduction
Keyphrases of scientific papers provide important topical information about the papers in a highly concise form and are crucial for understanding the evolution of ideas in a scientific field (Hall et al., 2008;Augenstein et al., 2017).Keyphrases of scientific papers can be also useful for mining and analyzing the literature (Augenstein et al., 2017) and for multiple downstream tasks such as index construction (Ritchie et al., 2006), summarization (Qazvinian et al., 2010), query formulation (Song et al., 2006), recommendation (Augenstein et al., 2017), reviewer matching for paper submissions (Augenstein et al., 2017) and for clustering papers for fast retrieval (Hammouda et al., 2005).
Identifying keyphrases from the scientific documents has been popularly studied under the paradigm of Keyphrase Extraction for over a decade (Mihalcea and Tarau, 2004;Wan and Xiao, 2008a;Florescu and Caragea, 2017;Liu et al., 2009Liu et al., , 2010;;Sterckx et al., 2015Sterckx et al., , 2016;;Caragea et al., 2014;Das Gollapalli and Caragea, 2014;Al-Zaidy et al., 2019;Patel and Caragea, 2021).However, one of the major limitations is that these approaches are not able to capture the semantic information in the document, particularly the absent keyphrases (keyphrases that are not present in the document).
In this paper, we therefore focus on a newer paradigm of keyphrase generation introduced by Meng et al. (2017).Instead of extracting keyphrases from the document, we model the problem as a neural sequence-to-sequence (seq2seq) approach where we learn to generate keyphrases in an autoregressive way.This approach can not only predict the exact keyphrases that are present in a document (present keyphrases) but also those that are semantically relevant but absent from the document (absent keyphrases).While many works (Meng et al., 2017;Chen et al., 2019bChen et al., , 2020;;Yuan et al., 2020;Ye et al., 2021b) adopt this approach, their focus is on architectural innovations to improve the generation of both present and absent keyphrases.In contrast, our focus is different from that of the prior works in keyphrase generation: we propose to extend the source of information of the input sequence, which is thus far limited to only title and abstract.
To this end, we curate a large-scale dataset of papers published by ACM 1 which contains not only the title and abstract of the documents but also the full text from the documents.We call this dataset FULLTEXTKP.Owing to the scarcity of such large-scale datasets with full text and the difficulty of parsing and understanding the entire text of the documents, the integration of additional information that goes beyond the title and abstract for keyphrase generation has been largely ignored.
In this paper, we provide innovative ways of using certain parts of the documents that could be rich source of information, e.g., using citation sentences from the content of the document or an extractive summary of the document (as illustrated in Table 1).Interestingly, through comprehensive experiments, we find that the citation contexts or provenance that had been a rich source of information for the keyphrase extraction task (Caragea et al., 2014;Das Gollapalli and Caragea, 2014) are not the richest sources of information in the keyphrase generation task.In contrast, we observe that the semantic information from the document could be best assimilated if we summarize the document and use the summary of the document instead of the document itself.To summarize each document, we use an unsupervised summarization approach and show that the extractive summmary not only contains the highest topical information useful for keyphrase generation but can also fit the computational budget.Remarkably, the approach based on the incorporation of summary outperforms methods that use just title and abstract by a wide margin, oftentimes yielding 2-3 times improvements in performance in both present and absent keyphrases.
Our experiments are performed with three widely used models for keyphrase generation as well as, to our knowledge, for the first time in keyphrase generation, with a transformer model suited for long documents, Longformer Encoder-Decoder (LED) (Beltagy et al., 2020).
Overall, we make the following contributions: 1. We explore the benefit of integrating additional information from different data sources (not just the title and abstract) into neural seq2seq models for keyphrase generation (i.e., predicting both present and absent keyphrases).The different data sources include random sentences from the article body,2 sentences from the summary of the body, citation sentences, non-citation sentences, and sentences from other related documents in the training set.
2. We show that the sentences from the body of the article that form an extractive summary are great sources of finding good content for keyphrase generation.
3. We present a new dataset, FULLTEXTKP, of ∼140,000 articles, each with its full text.

Related Work
Keyphrase Generation.Recent approaches to keyphrase generation have been dominated by neural seq2seq models because they provide a mechanism to also generate absent keyphrases.Meng et al. (2017) originally proposed an RNN-based seq2seq model along with CopyNet (CopyRNN) for keyphrase generation.External Information.Some of the previous methods try to incorporate external information to enhance the performance of keyphrase extraction.For example, MAUI (Medelyan et al., 2009) uses semantic information from Wikipedia, CeKE (Caragea et al., 2014) and CiteTextRank (Gollapalli and Caragea, 2014) use information from citation networks, and SingleRank and ExpandRank (Wan and Xiao, 2008a) use information from local textual neighborhoods.However, these methods are limited to extraction-based methods whereas we focus on generating keyphrases to have a holistic or semantic understanding of the document when the natural language text is fed to the neural model.
Several works in keyphrase generation started to use external information to generate keyphrases.For example, Chen et al. (2019a), Santosh et al. (2021) proposed methods where they augmented a set of additional keyphrases from semantically similar documents and use hidden state representations built from these keyphrases, Diao et al. (2020) used cross-document attention networks, Ye et al. (2021a) used information from retrieved documentkeyphrase pairs that are similar to the source document, Shen et al. (2021) used a phrase bank by pooling all keyphrases from all the articles.Similar to these works, one of our methods (Retrieval-Augmentation) retrieves relevant sentences from semantically similar documents and directly appends them to the title and abstract.
Datasets.There have been numerous efforts in the direction of curating large-scale datasets for either keyphrase extraction or keyphrase generation.In the non-scholarly domain, the datasets such as KPTimes (Gallina et al., 2019), JPTimes (Gallina et al., 2019), KPCrowd (Marujo et al., 2013), DUC-2001 (Wan andXiao, 2008b), were curated from news articles, OpenKP (Xiong et al., 2019) was mined from webpages from a search engine.In the scholarly domain, there has been ongoing research with the datasets such as Krapivin (Krapivin et al., 2008), Inspec (Hulth, 2003), SemEval (Kim et al., 2010), NUS (Nguyen and Kan, 2007), KP20k (Meng et al., 2017) and OAGK (Cano and Bojar, 2018).The limitation of these datasets is that they are either very small (of the magnitude of a few hundreds or thousands) or they are bounded to title and abstract.For instance, the widely used KP20k dataset (Meng et al., 2017) consists of only the title and abstract of the research articles and the average words per document is as low as 176.To leverage and understand the benefit of additional data, we mined a new dataset based on the articles primarily from the ACM database which contains not just the titles and abstracts but also the full text of the scholarly documents.In our dataset, the typical length of full papers is 5000-10000 tokens.

Methods
We use four different models, three of which are popular models in the Keyphrase Generation task, viz.catSeq (Yuan et al., 2020), One2Set (Ye et al., 2021b), ExHiRD (Chen et al., 2020); and the fourth model is based on one of the latest Transformer models, i.e., Longformer Encoder-Decoder (LED) (Beltagy et al., 2020), suitable for long documents.The baseline in each model takes the title and abstract (T+A) as input and predicts a sequence of keyphrases.We explore different extensions (also illustrated in Table 1) for each model by concatenating different types of data to the title and abstract (T+A).During the concatenation of additional information, we use a delimiter <sep> between the title and abstract and between all the additional sentences as follows: Title <sep> Abstract <sep> Sent 1 <sep> Sent 2 <sep> ... Sent k .We describe different types of data (input sequence) below: T+A+RANDOM: For this setup, we concatenate k randomly chosen sentences (from the body) to the title and abstract in the original order of their occurrence in the body.T+A+CITATIONS: For this setup, we collect all the sentences (from the body) that have cited some other article.We call such sentences as citation sentences.We randomly select k sentences from the entire pool and concatenate them to the title and abstract in the order of their occurrence in the body of the given article.The incorporation of citation sentences is inspired by Naïve-Bayes or TextRankbased approaches (Gollapalli and Caragea, 2014;Caragea et al., 2014) or construction of Citation-Graph (Viswanathan et al., 2021) which enhanced their keyphrase extraction performance through the integration of citation information.
T+A+NON-CITATIONS: For this setup, we collect all the sentences (from the body) that do not cite any other article.We randomly select k sentences from the entire pool and concatenate them to the title and abstract in the order of their occurrence in the body of the given article.
T+A+SUMMARY: In this method, we summarize the body of the article using a state-of-the-art unsupervised summarization algorithm, PacSum (Zheng and Lapata, 2019) and append k sentences from the summary to T+A of the article.Zheng and Lapata showed that TF-IDF-based PacSum is better than TextRank and other baselines, and is also highly competitive with even BERT-based PacSum.PacSum can be replaced with other summarization algorithms but we choose TF-IDF-based PacSum simply as a representative of a powerful and efficient model for unsupervised summarization to test the efficacy of summarizing the body.
The steps to obtain the extractive summary of the documents using PacSum are as follows.Consider a document as a list of sentences and each sentence as a list of words.The entire document can be visualized as a directed graph in which the individual sentences are the nodes and the edges are weighted by similarity.We first calculate the tf-idf to understand the relevance of a word to a sentence in the collection of sentences (document), quite analogous to computing tf-idf of a word of a document in a corpus of documents.We then compute the similarity between every pair of sentences.Next, we compute a threshold based on which we normalize the similarity scores.Subsequently, we compute forward and backward edge scores for the directed edges.Afterward, we compute the degree centrality of a sentence (node) in the document (graph) as the weighted average of forward and backward edge scores.Finally, we rank the sentences based on their centrality values, select the k highest ranked sentences, and concatenate them to the title and abstract.We describe the algorithm in Appendix C.

T+A+RETRIEVAL-AUGMENTATION:
In this method, we retrieve and augment k semantically similar sentences from the training corpus to the T+A of each article.To this end, we first create a set of all the sentences from the titles and abstracts of the articles in the entire training dataset.Next, we embed each sentence in the set using SPECTER (Cohan et al., 2020).We treat these embeddings as key embeddings representing the corresponding sentences (values).Given a target article (query), we then embed its title and abstract using SPECTER.This embedding serves as a query embedding to search related sentences from other articles.Subsequently, we compute the dot-product similarity of the query embedding with all the key embeddings using FAISS (Johnson et al., 2019).Last, we select k sentences (values) corresponding to the top k most similar key embeddings and concatenate the sentences to the title and abstract of the query article.We provide the algorithm in Appendix D.

FULLTEXTKP dataset
To evaluate the performance of models leveraging different types of information, we construct a new dataset, which we call FULLTEXTKP.Our FULL-TEXTKP dataset consists of research papers that are published by ACM3 and are available in the ACM digital library in its International Conference Proceedings Series (ICPS).
We used only the articles which have at least the five fields, viz., title, abstract, keywords, full text and references.We lowercased the text and constructed numerous regexes to remove the escape sequences, html tags, urls, emails, etc.We also replaced all the numbers and roman numerals with a <digit> token.Further, we used PunktSentenceTokenizer4 to segment the document into sentences and NLTK's word_tokenizer 4 to tokenize the sentences into tokens.We constructed suitable regex to extract the citation sentences and we considered all other sentences as non-citation sentences.The articles without any citation sentences and duplicates were further removed from the collection.Thus, we were able to collect 142,844 articles.We split the FULLTEXTKP dataset into 80/10/10 for train, test and validation sets.Statistical information about the dataset is shown in Table 2.
The construction of this dataset addresses the sparsity of large-scale datasets for keyphrase extraction/generation from scientific papers and is aimed at enabling deep learning modeling.Currently, there exist only a couple of large-scale datasets for this task, e.g., KP20K (Meng et al., 2017), and OAGK (Cano and Bojar, 2018).Unlike these datasets, which contain only the title and abstract of each paper, our dataset provides access to the full text of each paper.

Evaluation
We use the same evaluation method as Chan et al. ( 2019); Chen et al. ( 2020) and report the macroaveraged Precision (P), Recall (R) and F1-scores (F 1 ) in two different evaluation settings: @5 (dummy keyphrases are appended to the predicted keyphrases to make the total count as 5) and @M (directly compare the generated keyphrases against the gold keyphrases).All the keyphrases are stemmed using PorterStemmer5 before comparison.As in prior works, we treat a keyphrase as absent if it is absent from the title and abstract, which helps us compare the performance of generating absent and present keyphrases for different models in a consistent way.
For the entire data including Title, Abstract and the additional data using different methods, we restrict the maximum sequence length to 800.This requires us to use an adaptive k for the additional number of sentences that can fit in the given maximum sequence length.The selection of the sequence length was based on the tradeoff between the performance and computational budget of the models.We also experiment with longer sequence lengths for the LED model in §5.3.

Results
In this section, we first present the results for present and absent keyphrase generation performance in §5.1 for the different methods described in §3.We further explore our best performing method, i.e., T+A+summary into two subcategories: extractive and abstractive summarization methods in §5.2.Finally, in §5.3 we explore the performance of the best performing model, i.e., Longformer Encoder-Decoder with sequence lengths much higher than 800.We run all the experiments three times and report the average.

Present & Absent Keyphrase Generation
Tables 3 and 4 present the results for the present and absent keyphrase performances, respectively, for our different experiments.The results demonstrate that adding extra information from the body of the given article to the baseline (T+A) is beneficial for both present and absent keyphrase generation.Augmenting the summary sentences provides the most substantial boost compared to any other method for both present and absent keyphrase performance.For instance, improvements in present keyphrase performance on F1@M metric are: cat-Seq (0.338 → 0.380), ExHiRD (0.325 → 0.366), LED (0.360 → 0.397).The absent keyphrase performance using the summary method improves up to four times the baseline performance; for example, catSeq (0.021 → 0.079), ExHiRD (0.030 → 0.080), LED (0.061 → 0.106).Interestingly, with One2Set, although F1@M for present keyphrase generation does not improve with Summary and Random methods, the trend for absent keyphrase performance is very similar to that of the other models.The discrepancy in the present keyphrase generation performance could be because One2Set might potentially be biased towards identifying the keyphrases from the earlier portions of the documents.For our setting of long documents, it may require better heuristics to initialize the control codes so as to attend to the later portions of the document.Still, in three out of four models, adding summary information gives us the best results.This seems to follow the natural intuition that the summary of the article contains the most topical information useful for the generation the keyphrases.
Figure 1 further validates the above intuition.Summary method is the richest source of present keyphrases.The total number of present keyphrases in the texts of all Summary samples Model Method P@5 R@5 F1@5 P@M R@M F1@M catSeq Title+Abstract  is about ∼88,000 higher than just Title+Abstract samples.So, the model learns to generalize better when trained on such highly topical input texts of Summary method.Interestingly, we find that using only the citation sentences results in worse performance compared to using non-citation sentences or just random sentences.This observation is in contrast to the observation of the use of citation sentences for the keyphrase extraction methods.We hypothesize the reason for such an observation is that keyphrase extraction methods focus mainly on identifying keyphrases by capturing the statistical information such as tf-idf or word co-occurrences that appear in the target document and the cited ones (through the citation sentences), whereas the deep learning methods try to get a semantic or holistic understanding of the document itself (i.e., from all sentences).

T it le + A b s t r a c t + R e t -A u g + C it a t io n s + N o n -C it a t io
Incorporating either non-citation sentences or random sentences gives very similar results because the entire text contains majority of noncitation sentences.Unfortunately, appending the sentences retrieved from the semantically similar documents (using SPECTER) performs worse than the baseline.We hypothesize that even though the sentences have semantic similarity on the embedding level, they could still be significantly different in terms of their meaning and may not contain exactly the same keyphrases.Rather, a lot of such sentences may confuse and deviate the model from predicting the gold keyphrases.
Among the four different models, Longformer Encoder-Decoder (LED) performs the best.We hypothesize that this happens because it is based on the more sophisticated Transformer model architecture and also because it is suited for reading long documents.However, LED training takes at least 5-10 times more than the catSeq model.We provide more details about the compute power and Model Method P@5 R@5 F1@5 P@M R@M F1@M catSeq Title+Abstract Method P@5 R@5 F1@5 P@M R@M F1@M Abs_Sum 0. We performed a 2-tailed statistical significance test with alpha 6 value as 0.05 and the p-values 7 less than 0.05 were considered statistically different.We observed that the results were statistically different for all models except One2Set.With One2Set, the additional information to the T+A baseline does not yield statistically different results.

Analysis on Summarization Methods
In this section, we further investigate the capability of our best performing method, i.e., extractive summarization, for keyphrase generation in comparison with an abstractive summarization ap-6 Probability of rejecting the null hypothesis when it was actually true 7 Probability of getting a result that is as extreme or more extreme when the null hypothesis is true proach, abbreviated as T+A+Abs_Sum or simply Abs_Sum.For the abstractive summarization, we use BigBirdPegasus (Zaheer et al., 2020) model, pretrained on ArXiv dataset (Cohan et al., 2018) to first generate the abstractive summaries.Next, we augment the sentences from abstractive summaries to the title and abstract of the articles.Further, we train our best performing model (i.e., LED) using T+A+Abs_Sum method.In Table 5, we compare the results of T+A+Abs_Sum with the T+A+Summary (extractive) method.Note that, by default, we use Summary for the Extractive Summarization method (Zheng and Lapata, 2019), also abbreviated as Ext_Sum.
From Table 5, we observe that the extractive summary added to T+A performs better than the abstractive summary (added to T+A) by 0.034 (F1@M) and 0.052 (F1@M) for present and absent keyphrase generation, respectively.The behavior could be explained as follows: (1) Extractive summaries contain sentences that are more central to the paper (Zheng and Lapata, 2019).Thus, they tend to bring complementary information from all parts of the paper that improves the performance of keyphrase generation models; and (2) We manually inspected a small subset of the dataset and observed that about two-thirds of the sentences in the abstrac-Type Method Seq-Len P@5 R@5 F@5 P@M R@M F1@M tive summaries were from the Introduction section of the papers, which although contain extended but potentially overlapping or paraphrased description of the Abstract section.Further, we observed that there were about 10-15% repetitive statements in the abstractive summaries.We conclude that augmenting repetitive statements or statements with redundant information as the abstract do not aid the keyphrase generation performance.

Exploring longer sequence lengths with Longformer Encoder-Decoder
Scientific documents are often very long sequences of text, ranging from a few thousands to tens of thousands of tokens.Longformer Encoder-Decoder, proposed by Beltagy et al. (2020), has the capability to deal with long sequences of text.We therefore explore the effect of sequence lengths longer than 800 on the present and absent keyphrase performance.We prepare different versions of the T+A+Random with the maximum sequence lengths of 800, 1500, 2000 and 2500.We chose the Random version instead of directly using the entire source document since Random would be a better representative of the long document if we need to truncate the sequence length to a fixed budget (i.e., rather than truncating and completely removing a part of the document, we ensure coverage of sentences from the entire document).
Note that these sequence length values are based on the word-level tokenization based on NLTK's word-tokenizer.The sequence length value based on subword tokenization (used by Transformers) is actually much longer, but fortunately, LED is able to handle sequence length up to 16000 tokens.The results in Table 6 show that both present and absent keyphrase performance generally increases as we increase the source text length.In the table, we also compare the performance of these different versions of T+A+Random with the T+A+Summary with a much smaller sequence length of 800, so as to justify the purpose of smaller intended summary.We observe that the performance of Summary-800 is quite comparable to Random-2000 for both present and absent keyphrase performance, particularly for the most important metric F1@M.This further validates our hypothesis that summary indeed gives us a richer source of information.Just the sequence length of 800 can condense the information that could otherwise be extracted using sequence length of 2000.Furthermore, the summary has the benefit of fitting a smaller memory and computational resources budget.We observed that with the increasing sequence lengths, the training time increased at least at a linear rate and used considerable amount of GPU and RAM memory.
In the case we add more sentences to the summary, we expect the quality to degrade and converge to that of the Random versions and furthermore, it will no more stand out as a summary of the document.
In Table 7, we show sample predictions for the best performing method T+A+ (Extractive) Summary using LED model.The gold keyphrases and the predicted keyphrases are tokenized and stemmed before comparison.We make the following observations from the table.First, the model can predict accurately both the present and absent keyphrases.Second, some predicted keyphrases are near-matches of the gold keyphrases, and despite their quality as keyphrases, due to the limitation of the metric (exact match), they do not contribute to the model performance.Spectrum sharing for directional systems <sep> Dynamic spectrum access systems will need to ...An example involving a satellite downlink antenna and a broadband wireless access system using directional antennas is presented ... dynamic spectrum access, directionality Automatic classification of anuran sounds using convolutional neural networks <sep> ... networks with mel-frequency cepstral coefficients ( mfccs ) as input for the task of ... <sep> Different machine learning approaches for anuran classification ... anuran, convolutional neural networks, wireless sensor networks, mfcc, machine learning

Conclusion
In this paper, we explore numerous ways of incorporating additional data from the body of the scholarly documents for improving the performance of the keyphrase generation task.We conclude from the results that the extractive summary sentences, that are more central to the paper, provide the most topical information for boosting both present and absent keyphrase generation performance.The citation, non-citation, and random sentences also bring complementary information to improve the performance modestly.Some other ways such as augmenting semantically similar sentences from other papers or augmenting abstractive summary sentences, that bring repetitive or redundant information to titles and abstracts, are not effective.We present a comprehensive analysis with four models including LED, yet unexplored for the task.Our work aims at breaking apart the barriers of using only titles and abstracts, and presents a large-scale dataset with full texts for keyphrase generation.

Limitations
One of the limitations of the proposed methods is the increased (up to 2-3 times) compute time and memory for the training of the models compared to the conventional training of the models using only T+A of the articles.We provide more details in Appendix B. Further, our best performing method T+A+Summary requires an additional modest overhead of pre-computing summaries for all the articles.
Another potential limitation of our work is that we can not directly compare our models on the widely used datasets, e.g., KP20k, Inspec, Krapivin, NUS since these datasets do not have full texts of the papers.To be comprehensive, we considered the performance of four models on the new dataset.
We encourage future work in the direction of better ways of integrating external information including more sophisticated approaches for the summarization of scientific documents.

Ethical Considerations
Since keyphrase generation finds direct application in many downstream tasks such as recommender systems, reviewer matching, articles clustering for fast retrieval, etc., it is important to consider the ethical implications of these models.The models are not perfect and might make misleading predictions at times.The users must make their own discrimination in deploying the current models for decision-making systems.8: Time and Memory consumption using the conventional method (T+A) and the new methods (T+A+Body) where Body could be sentences retrieved using any of the methods like Summary, Citation, Non-Citation, Random, Ret-Aug, with total sequence length 800.

B Time and Memory Consumption
In Table 8, we provide the comparison of time and memory utilization between the conventional (i.e., Title+Abstract) and new methods (i.e., Ti-tle+Abstract+Body) in reference to our best performing model (i.e., LED).Body refers to sentences retrieved using any of the methods described in §3, i.e., Random, Citation, Non-Citation, Summary, Retrieval-Augmentation.
The conventional method takes about 3.6 hours/ epoch and 10 GB of CUDA memory with A6000 GPU for training.Whereas the newer methods take 5.5 hours/ epoch and about 30 GB of CUDA memory for training.The inference time is less than 1 GPU hour for both types of methods.

Figure 1 :
Figure 1: #Present and #Absent Keyphrases in terms of the whole input text (for various methods) in FULLTEXTKP dataset.+ indicates concatenation of title and abstract with the specified information.

Table 3 :
Results comparing present keyphrase performance with different methods of using additional data with the FULLTEXTKP dataset.Subscripts denote the standard deviation as multiples of ± 0.001.

Table 4 :
Results comparing absent keyphrase performance with different methods of using additional data with the FULLTEXTKP dataset.Subscripts denote the standard deviation as multiples of ± 0.001.

Table 5 :
Results comparing the performance of Abstractive Summarization vs. Extractive Summarization methods.Subscripts denote the standard deviation as multiples of ± 0.001.
time in Appendix B.

Table 6 :
Results comparing the performance of Longformer Encoder-Decoder Model with source texts of different sequence lengths.Subscripts denote the standard deviation as multiples of ± 0.001.

Table 7 :
Predicted keyphrases by Longformer Encoder-Decoder model are highlighted in cyan .The source text is in the format: Title <sep> Abstract <sep> Summary-Sent_1 ... <sep> Summary-Sent_k of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7961-7975, Online.Association for Computational Linguistics.