CiteSum: Citation Text-guided Scientific Extreme Summarization and Domain Adaptation with Limited Supervision

Scientific extreme summarization (TLDR) aims to form ultra-short summaries of scientific papers. Previous efforts on curating scientific TLDR datasets failed to scale up due to the heavy human annotation and domain expertise required. In this paper, we propose a simple yet effective approach to automatically extracting TLDR summaries for scientific papers from their citation texts. Based on the proposed approach, we create a new benchmark CiteSum without human annotation, which is around 30 times larger than the previous human-curated dataset SciTLDR. We conduct a comprehensive analysis of CiteSum, examining its data characteristics and establishing strong baselines. We further demonstrate the usefulness of CiteSum by adapting models pre-trained on CiteSum (named CITES) to new tasks and domains with limited supervision. For scientific extreme summarization, CITES outperforms most fully-supervised methods on SciTLDR without any fine-tuning and obtains state-of-the-art results with only 128 examples. For news extreme summarization, CITES achieves significant gains on XSum over its base model (not pre-trained on CiteSum), e.g., +7.2 ROUGE-1 zero-shot performance and state-of-the-art few-shot performance. For news headline generation, CITES performs the best among unsupervised and zero-shot methods on Gigaword.


Introduction
Scientific summarization typically regards paper abstract as the ground-truth summary, as it is written by the authors themselves with relatively high quality and readily available in most scientific documents.However, paper abstract may not always be the ideal summary because it often involves certain details such as task description, background information, and experiment results (cf. the abstract of this paper).As a result, recent work (Cachola et al., Paper Abstract: We study the problem of transferring a sample in one domain to an analog sample in another domain.Given two related domains, S and T , we would like to learn a generative function G that maps an input sample from S to the domain T , such that the output of a given function f , which accepts inputs in either domains, would remain unchanged.Other than the function f , the training data is unsupervised and consist of a set of samples from each domain.The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f -constancy component, and a regularizing component that encourages G to map samples from T to themselves.We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.
Citation Text: Taigman et al. [8] proposed the Domain Transfer Network (DTN) to map a sample from one domain to an analog sample in another domain and achieved favorable performance on small resolution face and digit images.Table 1: An example showing that the citation texts of a paper can often be used as its ultra-short summary.2020) has studied the problem of scientific extreme summarization, which aims at forming ultra-short summaries (usually one sentence) of the papers, namely the TLDR2 summaries.
However, unlike paper abstracts, ultra-short paper summaries are far from being universally available.Only certain scientific venues such as Open-Review.netsupport a TLDR field during paper submission, which is completely optional, and not all submitted papers provide such information.In addition, human-annotated summaries of scientific documents are rather costly and require domain expertise.As a consequence, the previous SciTLDR dataset (Cachola et al., 2020), using a combination of author-provided TLDR and human-annotated TLDR (rephrased from paper reviews on OpenRe-view), only collected around 2,000 examples for training and 600 for testing.
In this paper, we argue that citation texts can often serve as high-quality short summaries of the cited papers.In Table 1, we show the abstract of one paper and its citation sentence in a follow-up paper.We observe that the citation sentence introduces the cited method and its contributions in a concise and accurate manner.Motivated by such observations, we propose a simple yet effective approach to locating, extracting, and filtering citation texts from scientific papers.We then treat the processed citation texts as ground-truth summaries of the cited papers.Based on the proposed approach, we create a large-scale scientific extreme summarization benchmark, CiteSum, which is automatically derived from citation texts and around 30 times larger than the previous human-annotated dataset SciTLDR (Cachola et al., 2020).
We conduct a comprehensive analysis of Cite-Sum regarding its data characteristics and quality, meanwhile establishing strong baselines as the reference for future studies.We further verify the usefulness of CiteSum by demonstrating that models pre-trained on CiteSum, which we name as CITES (Citation Text-guided Summarizer), exhibit superior generalizability during low-resource adaptation to new tasks and domains.
On the human-annotated scientific extreme summarization dataset SciTLDR (Cachola et al., 2020), our zero-shot BART-based (Lewis et al., 2020) CITES, without any fine-tuning, performs better than most fully-supervised baselines, including the fully-supervised BART model (without pre-training on CiteSum).Our few-shot CITES achieves state-of-the-art performance with only 128 labeled examples from SciTLDR.In addition, CITES outperforms its base model (BART) on two more diverse scientific tasks -discipline classification and title generation.When transferring to news extreme summarization, despite the domain mismatch, CITES achieves significantly better zero-shot performance than BART and PEGA-SUS (Zhang et al., 2020) (e.g., +7.2 ROUGE-1) and state-of-the-art few-shot performance on the XSum dataset (Narayan et al., 2018).Furthermore, CITES performs the best among unsupervised and zero-shot methods on the Gigaword news headline generation dataset (Rush et al., 2015).Contributions.(1) We propose a simple yet effective approach to automatically extracting ultra-Citation Example 1: We take the publicly available Semantic Scholar Open Research Corpus (S2ORC) (Lo et al., 2020) as the source for data creation.
Citation Example 2: Unlike WikiTransfer (Fabbri et al., 2021), CITES does not involve any downstream task-specific data selection or model tuning.(Lo et al., 2020) as the source for data creation.In the latest version of S2ORC, there are 136M scientific papers from different academic disciplines and the number of papers with full-text access is 12M.We further remove papers without citation information, resulting in 9M papers as the candidates.
Quality Control Not all citation texts are of high quality and can be used as summaries of the cited papers.In Table 2, we show two examples (in our paper) where the citation sentence simply (1) describes the data source or (2) introduces the difference of the citing paper from the cited paper.We note that prior studies on citation text generation (Chen et al., 2021;Ge et al., 2021) often do not filter these citation texts and simply treat all paragraphs/sentences with citations as the ground-truth labels, as their goals are not on paper summarization but writing assistance.
To ensure data quality, we carefully locate, extract, and filter the citation texts of papers in the following manner.First, we only take citation texts in the Related Work section of a paper, which largely ensures that they describe the content of the cited paper instead of irrelevant information, such as task Dataset Train / Val / Test len src len summ Automatic?
Next, we measure the similarity between the citation texts and the cited papers and filter dissimilar pairs.Intuitively, if a citation sentence can serve as a high-quality summary, certain amount of its content should be from the cited paper.Prior work (Lu et al., 2020) also showed that authors tend to cite a paper using the information in the abstract of the cited paper.We thus calculate the overlap between paper abstracts and their citation sentences, and filter those below a threshold T .We set T to 50/20/40 for ROUGE-1/2/L recall through manual examination, resulting in a ROUGE-1/2/L recall of 73.1/39.4/58.5 after filtering. 3As a reference, the ROUGE-1/2/L recall between paper abstracts and reference summaries on SciTLDR (Cachola et al., 2020) is 81.1/38.9/62.0 and 65.2/17.9/45.7 for author-provided (SciTLDR-Auth) and peer reviewderived (SciTLDR-PR) TLDR, respectively.That is, the abstraction level of CiteSum is between SciTLDR-Auth and SciTLDR-PR.This filtering step is rather strict as we prefer quality to quantity of the data and only 93K of the 426K examples (21.8%) are kept.
We further replace each citation span (e.g., "Taigman et al. [8]") with a special token "REF" as they vary in different papers but essentially have the same meaning (i.e., referring to a cited paper). 3We also experimented with semantic metrics such as BERTScore (Zhang et al., 2019) but they did not function as well as ROUGE-based metrics in our human evaluation.
Dataset Split After data filtering and preprocessing, there are 92,946 examples in the final citation text-guided summarization dataset, which we name as CiteSum.We take about 5% of the data as the validation and test sets respectively, and the remaining 90% as the training set.As one paper may be cited multiple times in different papers, we ensure that there is no label leakage by excluding papers used for evaluation from the training set.

Data Analysis
Dataset Statistics In Table 3, we show the data statistics of CiteSum and other relevant summarization datasets.In terms of data size, CiteSum is about half the size of other automatically constructed datasets like XSum (Narayan et al., 2018) and arXiv (Cohan et al., 2018) due to the availability of citation texts and our strict quality control.On the other hand, the size of CiteSum is much larger than human-annotated datasets on paper summarization (Yasunaga et al., 2019;Cachola et al., 2020) -almost 30 times larger than the Sc-iTLDR dataset (Cachola et al., 2020).Discipline Analysis In Fig. 1, we show the discipline distribution of papers in CiteSum.The discipline information is derived from the field of study in Microsoft Academic Graph (MAG) (Shen et al., 2018).We take the top field of study for each paper if there are multiple.We note that the discipline distribution in CiteSum is quite different from its data source S2ORC (Lo et al., 2020) where medicine and biology dominate.In contrast, most papers in CiteSum are in computer science.The shift in discipline distribution is because we explicitly keep papers with a Related Work section, where around 82.8% are computer science papers.We then take the citation texts in the above papers, which largely lead to papers in similar disciplines.As a result, most papers in CiteSum are from computer science, mathematics, and engineering.

Discipline distribution of papers
Citation Analysis In Fig. 2, we show the average number of citations for papers in CiteSum.Note that the citation count shown does NOT reflect the total number of citations due to data filtering, but how many times a paper appears in CiteSum as examples (with the same input and different citation sentences as target output).In total, there are 59,707 unique papers in CiteSum with an average citation of 1.56, and 98% of the papers have fewer than 5 citations.Compared to prior work, we do not only target popularly cited papers (Yasunaga et al., 2019) and use different citation texts as different training examples instead of multiple reference summaries (Cachola et al., 2020).

Human Evaluation
We randomly sample 50 examples from CiteSum and ask two human annotators with a background in computer science to examine whether the citation sentences can serve

Experiments on CiteSum
In this section, we experiment on CiteSum with state-of-the-art baselines and analyze their performance under different setups to provide references for future studies.Implementation and training details are provided in App.B.

Examined Methods
We use BART-large (Lewis et al., 2020) and PEGASUS-large (Zhang et al., 2020) as the base models as they are the state-of-the-art methods on multiple summarization datasets.We examine the base models with different inputs such as paper abstract (Abs), abstract+introduction+conclusion (AIC), and abstract+title.In addition to using the TLDR (citation text) as the only generation target, we evaluate two multi-task settings with paper title and discipline (Disci)4 as the targets, where different prefix tokens are added to the input such that the model can generate different targets given the same paper abstract as input (Cachola et al., 2020).We further evaluate the following extractive baselines.EXT-LEAD: a method that takes the first sentence of the paper abstract, which performs fairly well in news summarization.EXT-HEURISTIC: a heuristic method that looks for the first sentence containing "propose", "introduce", or "in this paper", as such sentences likely reflect the contribution of the paper.It falls back to EXT-LEAD if no such sentences are found.EXT-ORACLE: an upper bound that matches each sentence in the paper abstract with the reference summary and takes the sentence with the highest ROUGE-2 F1.

Results
In Table 5, we show the results of various baseline methods on CiteSum.When given paper abstract as the source document, PEGASUS performs worse than BART and we thus use BART as the major model in the following experiments.Further adding paper introduction and conclusion to the model input slightly improves model performance, at the expense of longer training time and increased memory usage.The gains brought by adding title and discipline information to model input are marginal, while using them for multi-task learning does not lead to clearly better results.The fact that methods proposed by recent studies such as multi-task learning (Cachola et al., 2020) perform ineffectively on CiteSum indicates that CiteSum remains an unresolved and challenging scenario.
For the extractive baselines, EXT-LEAD performs significantly worse than that in the news domain (Mao et al., 2020a).EXT-HEURISTIC improves upon EXT-LEAD drastically and yet lags behind state-of-the-art methods by a large margin.EXT-ORACLE performs the best, the performance of which is generally consistent with the numbers on the human-annotated SciTLDR dataset (Cachola et al., 2020).On the other hand, the fact that abstractive methods have approached the extractive upper bound indicates that more abstraction is needed to further improve model performance on CiteSum.
We believe that CiteSum provides a wellestablished testbed for future studies on (scientific) extreme summarization.The following future directions may be worth exploring: 1) how to better understand the structure and content of scientific papers with domain knowledge (via relevant papers, terminology, taxonomies, etc); 2) how to better capture the differences in writing styles across various domains; and 3) how to improve the saliency, factual correctness, and explainability of TLDR summaries given their conciseness.

Transferring to New Tasks and Domains with CITES
To further verify the quality and usefulness of Cite-Sum, we adapt models pre-trained on CiteSum to new tasks and domains, some of which are rather different from CiteSum and make model transfer with limited supervision very challenging.Specifically, we name our pre-trained model as CITES (Citation Text-guided Summarizer).CITES uses the simplest form in Sec. 3 with paper abstract as input and TLDR as target output.We evaluate CITES on various downstream tasks with no fine-tuning (zero-shot) or limited training examples (few-shot), including scientific extreme summarization on SciTLDR (Cachola et al., 2020), news extreme summarization on XSum (Narayan et al., 2018), and news headline generation on Gigaword (Rush et al., 2015).Additionally, we evaluate CITES on two more diverse tasks in the scientific domain, namely discipline classification and title generation, in a fully-supervised setting.

Scientific Extreme Summarization
Setup SciTLDR (Cachola et al., 2020), the human-annotated scientific extreme summarization dataset, is an ideal testbed for further verifying the quality and usefulness of CiteSum since they both target extreme summarization, belong to the scientific domain (though CiteSum involves more disciplines), and share similar input/output formats (though CiteSum has slightly longer inputs).One noticeable difference, however, is the point of view of the summaries -in SciTLDR the reference summaries typically start with "We" or "This paper", while in CiteSum they often begin with "Author-Name et al." (replaced by a special token "REF" during preprocessing).
We propose two simple techniques to tackle such subtle style differences when adapting CITES to SciTLDR in a zero-shot setting without fine-tuning.The first technique is post-processing: we replace "REF" with "This paper" if the summary begins with "REF" and remove all other "REF" within the summary.The second technique is prompting: we use "This paper" as a prompt in the model decoder such that the summary always starts with "This paper".Similarly, in the few-shot setting, we replace the leading "We" with "This paper REF" in the reference summaries of SciTLDR (on the training set only) to alleviate the style mismatch.
We use BART-large (Lewis et al., 2020) as the base model of CITES since most baselines on Sc-iTLDR, including the state-of-the-art methods, use the same base model.

Zero-shot Results
In Table 6, we show the performance comparison of different methods on SciTLDR.In the zero-shot setting, CITES (post-processing) outperforms competitive fullysupervised baselines such as BERTSum (Liu and Lapata, 2019).CITES (prompting) performs even better than CITES (post-processing), outperforming the fully-supervised BART model it is based upon.Such results demonstrate the benefits of pretraining on CiteSum.CITES (prompting), without any fine-tuning, is also on par with the state-of-theart method CATTS (Cachola et al., 2020), while slightly worse than CATTS XSUM , which pre-trains on the XSum dataset (Narayan et al., 2018) first.
We additionally test a zero-shot upper bound for CITES by providing our prompting model with the first 3 tokens in the reference summary (the most common ones are "We propose a" and "We present a") such that it knows how to start to sum- EXT-ORACLE 47.7 24.7 38.5 Fully-supervised PACSUM (Zheng and Lapata, 2019) 19.3 4.0 15.1 BERTSum (Liu and Lapata, 2019) 38.5 16.6 30.5 MatchSum (Zhong et al., 2020) 42.7 20.0 34.0 BART (Lewis et al., 2020) 43.3 20.8 35.0 BART XSUM (Lewis et al., 2020) 42.5 21.1 34.9 CATTS (Cachola et al., 2020) 43.8 20.9 35.5 CATTS XSUM (Cachola et al., 2020) 44 Data Overlap To ensure that the superior generalizability of CITES does not merely come from data leakage, we detect the overlap between Cite-Sum and SciTLDR.We consider two papers (near) identical if their TF-IDF cosine similarity is greater than 0.9 and find that only 9.7% papers in the test set of SciTLDR appear in the training set of Cite-Sum.Also, note that the training labels in CiteSum are automatically extracted citation sentences and different from SciTLDR.

Scientific Discipline Classification and Title Generation
We have demonstrated the effectiveness of CITES on the task of scientific extreme summarization.Next, we explore the feasibility of transferring CITES to more diverse tasks.
Setup We evaluate CITES with the task of scientific discipline classification and title generation.Similar to the multi-task experiments in Sec. 3, we use the same dataset split and model input, while replacing the generation target from summaries (citation texts) to the discipline or title of the papers.
Examples with unavailable discipline or title are removed.We use BART-large as the base model for this experiment and compare BART with CITES in an apple-to-apple comparison.

Results
In Table 7, we show the performance comparison on title generation and discipline classification.CITES consistently outperforms BART on both tasks, although the differences are not as significant as in other low-resource transfer experiments.The moderate gains are possibly because there is abundant training data for the two tasks and continuous pre-training thus does not help much.As evidence, the (unweighted) Macro-F1 of CITES is considerably better than BART, which we found is because CITES performs well on those disciplines with fewer examples.Regarding the Weighted-F1, CITES is only slightly better as most papers belong to a single discipline (computer science) that dominates the score.

News Extreme Summarization
Setup With the success on different tasks in the scientific domain, we next evaluate CITES on a more difficult setting where the domain is significantly different while the task is still extreme summarization.We take the XSum dataset (Narayan et al., 2018) in the news domain for this purpose.We mainly use PEGASUS-large (Zhang et al., 2020) as the base model of CITES as its fullysupervised version holds the state-of-the-art results on XSum.We additionally evaluate CITES Title in the zero-shot setting, which is the variant used for title generation in Sec.4.2.

Zero-shot Results
In Table 8, we show the results on XSum with various training data sizes.In the zero-shot setting, CITES significantly improves over its base model PEGASUS (+7.2 ROUGE-1).
line methods, including WikiTransfer, and achieves state-of-the-art few-shot performance on XSum.
In particular, CITES performs better than fullysupervised methods such as BERTSum (Liu and Lapata, 2019) with only 100 examples.

News Headline Generation
Setup To take a step further, we study the transfer performance of CITES to news headline generation.We use the Gigaword headline generation dataset (Rush et al., 2015) for this evaluation.We again consider two variants of CITES, one pretrained with citation texts as the generation target and the other further pretrained with paper titles as in Sec.4.2.We use BART-large (Lewis et al., 2020) as the base model in this evaluation.

Results
In Table 9, we show the results of various methods on news headline generation.CITES again outperforms its base model (BART) significantly and achieves competitive performance with most unsupervised and zero-shot methods designed for news summarization (Zhang et al., 2020;Zhu et al., 2021a).CITES Title further achieves state-ofthe-art zero-shot performance despite pre-training on the scientific domain, demonstrating the generalizability and usefulness of CiteSum.

Related Work
Citation Text Generation There have been prior studies utilizing citation texts for different purposes.
One popular line of work focuses on the generation of the citation texts for writing assistance or paper comparison (Xing et al., 2020;Luu et al., 2021;Chen et al., 2021;Ge et al., 2021).However, they typically do not distinguish the citation texts that can serve as summaries of the cited paper from those used for other purposes, e.g., background or result comparison (Cohan et al., 2019).For example, Chen et al. (2021) treat citation text generation as a multi-document summarization task, where the target output is a paragraph with more than two citations and the model input is the abstracts of all cited papers.There is no filtering regarding the citation texts and all the paragraphs with enough citations are included.Besides including citation texts with various intents and the lack of quality control, prior studies differ from CiteSum in that they target longer outputs, e.g., multiple sentences (Xing et al., 2020) or the entire Related Work section (Lu et al., 2020;Chen et al., 2021).
Citation Text for Paper Summarization Another line of work does not generate but extracts the citation texts and either uses them to form a summary directly (Nakov et al., 2004;Abu-Jbara and Radev, 2011;Qazvinian et al., 2013) or treats them as a bridge to the cited paper (Cohan and Goharian, 2015;Yasunaga et al., 2019).Specifically, the citation texts in the latter studies are used to find relevant contexts in the cited paper (called citation contexts).Then, a long summary is formulated primarily using the cited paper, e.g., by selecting sentences from the citation contexts (Cohan and Goharian, 2015).Unlike CITES, prior citationbased summarization methods require (often multiple) citation texts of a paper as input, which are unavailable for new papers.In addition, they do not target ultra-short but abstract-long summaries.
Extreme Summarization Extreme summarization aims to form ultra-short summaries of the documents.Notable benchmarks in this direction include XSum (Narayan et al., 2018) and New-SHead (Gu et al., 2020) in the news domain, Sc-iTLDR (Cachola et al., 2020) in the scientific domain, and Webis-TLDR-17 (Völske et al., 2017) for social media summarization.Compared to Sc-iTLDR, our CiteSum dataset is significantly larger in scale, from more venues than OpenReview, and composed of various disciplines.
Summarization with Limited Supervision Our work is also related to unsupervised and zero/fewshot summarization that constructs weakly supervised guidance signals using e.g., data characteristics (Chu and Liu, 2019;Mao et al., 2020b), domain knowledge (Zhu et al., 2021b), or pseudo labeled data (Yang et al., 2020;Zhong et al., 2022).Compared to prior studies, CITES shows great crossdomain capability that has not been well explored.

Conclusion
In this paper, we propose a simple yet effective approach to automatically extracting ultra-short paper summaries from citation texts.Based on the proposed approach, we create a large-scale, highquality benchmark for scientific extreme summarization.We conduct a comprehensive analysis on the created benchmark and further demonstrate that models pre-trained on it exhibit superior generalizability to new tasks and domains such as news extreme summarization and headline generation with limited supervision.

Limitations
Regarding data collection, while we have taken multiple steps to improve data quality, as in all automatically created datasets, there are still lowquality examples.We show some examples of low quality in App. A. Limiting citation texts to the Related Work section improves data quality, but also excludes the majority of available citation sentences and makes CiteSum concentrated in the field of computer science and engineering.
Regarding model performance, our transfer experiments are performed in scientific and news domains.While promising, there is no guarantee that CITES works well in other domains.Also, with abundant in-domain training data, pre-training on CiteSum may not lead to significant improvements.

<Rating 1>
Paper Title: Congested traffic states in empirical observations and microscopic simulations Paper Abstract: We present data from several German freeways showing different kinds of congested traffic forming near road inhomogeneities, specifically lane closings, intersections, or uphill gradients.The states are localized or extended, homogeneous or oscillating.Combined states are observed as well, like the coexistence of moving localized clusters and clusters pinned at road inhomogeneities, or regions of oscillating congested traffic upstream of nearly homogeneous congested traffic.The experimental findings are consistent with a recently proposed theoretical phase diagram for traffic near on-ramps [D.Helbing, A. Hennecke, and M. Treiber, Phys. Rev. Lett. 82, 4360 (1999)].We simulate these situations with a novel continuous microscopic single-lane model, the "intelligent driver model" (IDM), using the empirical boundary conditions.All observations, including the coexistence of states, are qualitatively reproduced by describing inhomogeneities with local variations of one model parameter.We show that the results of the microscopic model can be understood by formulating the theoretical phase diagram for bottlenecks in a more general way.In particular, a local drop of the road capacity induced by parameter variations has practically the same effect as an on-ramp.Citation Text: In a first approach, we use the well-known "intelligent driver model" (IDM) REF to show that the method works.

<Rating 2>
Paper Title: Probabilistic Model-Agnostic Meta-Learning Paper Abstract: Meta-learning for few-shot learning entails acquiring a prior over previous tasks and experiences, such that new tasks be learned from small amounts of data.However, a critical challenge in few-shot learning is task ambiguity: even when a powerful prior can be meta-learned from a large number of prior tasks, a small dataset for a new task can simply be too ambiguous to acquire a single model (e.g., a classifier) for that task that is accurate.In this paper, we propose a probabilistic meta-learning algorithm that can sample models for a new task from a model distribution.Our approach extends model-agnostic meta-learning, which adapts to new tasks via gradient descent, to incorporate a parameter distribution that is trained via a variational lower bound.At meta-test time, our algorithm adapts via a simple procedure that injects noise into gradient descent, and at meta-training time, the model is trained such that this stochastic adaptation procedure produces samples from the approximate model posterior.Our experimental results show that our method can sample plausible classifiers and regressors in ambiguous few-shot learning problems.Citation Text: They extended their approach by incorporating a probabilistic component such that for a new task, the model is sampled from a distribution of models to guarantee a higher model diversification for ambiguous tasks REF.Using the Web to deliver Infrastructure, Software and Platform as a Service (SaaS/PaaS) has benefits of reducing the cost of investment in internal resources of an organisation.It also provides greater flexibility and scalability in the utilisation of the resources.There are different cloud deployment models -public, private, community and hybrid clouds.This paper presents the results of research and development work in deploying a private cloud using OpenStack at the University of Huddersfield, UK, integrated into the University campus Grid QGG.The aim of our research is to use a private cloud to improve the High Performance Computing (HPC) research infrastructure.This will lead to a flexible and scalable resource for research, teaching and assessment.As a result of our work we have deployed private QGG-cloud and devised a decision matrix and mechanisms required to expand HPC clusters into the cloud maximising the resource utilisation efficiency of the cloud.As part of teaching and assessment of computing courses an Automated Formative Assessment (AFA) system was implemented in the QGG-Cloud.The system utilises the cloud's flexibility and scalability to assign and reconfigure required resources for different tasks in the AFA.Furthermore, the throughput characteristics of assessment workflows were investigated and analysed so that the requirements for cloud-based provisioning can be adequately made.Citation Text: In REF , the authors focus on the use of a private cloud environment in order to improve the High Performance Computing (HPC) research infrastructure.

Figure 1 :
Figure 1: Discipline distribution of papers in CiteSum.Log scale is used for clearer illustration.Disciplines with lower than 0.1% distribution are omitted.
<Rating 3> Paper Title: A Generic Multi-Projection-Center Model and Calibration Method for Light Field Cameras Paper Abstract: Light field cameras can capture both spatial and angular information of light rays, enabling 3D reconstruction by a single exposure.The geometry of 3D reconstruction is affected by intrinsic parameters of a light field camera significantly.In the paper, we propose a multiprojection-center (MPC) model with 6 intrinsic parameters to characterize light field cameras based on traditional twoparallel-plane (TPP) representation.The MPC model can generally parameterize light field in different imaging formations, including conventional and focused light field cameras.By the constraints of 4D ray and 3D geometry, a 3D projective transformation is deduced to describe the relationship between geometric structure and the MPC coordinates.Based on the MPC model and projective transformation, we propose a calibration algorithm to verify our light field camera model.Our calibration method includes a close-form solution and a non-linear optimization by minimizing re-projection errors.Experimental results on both simulated and real scene data have verified the performance of our algorithm.Citation Text: Zhang et al REF proposed a multi-projection-center (MPC) model with six intrinsic parameters to characterize both conventional and focused LF cameras.<Rating 4> Paper Title: Advancing Research Infrastructure Using OpenStack Paper Abstract: Abstract-Cloud computing, which evolved from grid computing, virtualisation and automation, has a potential to deliver a variety of services to the end user via the Internet.

Table 2 :
Examples (in our paper)showing that citation texts have different intents and cannot always be used as summaries of the cited paper.shortpaper summaries from citation texts.(2) Based on the proposed approach, we create a largescale scientific extreme summarization benchmark CiteSum and conduct a comprehensive analysis of it.(3) We further verify the quality and usefulness of CiteSum by demonstrating that models pretrained on CiteSum perform very well on new tasks and domains such as news extreme summarization and headline generation with limited training.
2.1 Data CreationData Source We take the publicly available Semantic Scholar Open Research Corpus (S2ORC)

Table 4 :
Ratings of citation sentences in CiteSum regarding whether they can serve as high-quality summaries of the cited papers.

Table 5 :
Performance of different methods on CiteSum.BART-large is used as the base model if not otherwise specified."/"indicatesmulti-task learning.R stands for ROUGE(Lin, 2004) in all the tables.
marize and (hopefully) which aspect to focus on.CITES (prompting, gold 3 tokens) achieves competitive ROUGE-1 and significantly better ROUGE-2/L than the extractive upper bound EXT-ORACLE that has access to the entire reference summary.

Table 7 :
Comparison of CITES and its base model (BART) on title generation and discipline classification.

Table 8 :
Performance comparison on the XSum dataset.Our few-shot results are averaged over 3 runs.

Table 9 :
Performance comparison on the Gigaword news headline generation dataset

Table 10 :
Examples in CiteSum with different quality ratings.

Table 11 :
Examples in CiteSum with different quality ratings.