OASum: Large-Scale Open Domain Aspect-based Summarization

Aspect or query-based summarization has recently caught more attention, as it can generate differentiated summaries based on users' interests. However, the current dataset for aspect or query-based summarization either focuses on specific domains, contains relatively small-scale instances, or includes only a few aspect types. Such limitations hinder further explorations in this direction. In this work, we take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale open-domain aspect-based summarization dataset named OASum, which contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages. We provide benchmark results on OASum and demonstrate its ability for diverse aspect-based summarization generation. To overcome the data scarcity problem on specific domains, we also perform zero-shot, few-shot, and fine-tuning on seven downstream datasets. Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model. Our dataset and pre-trained checkpoints are publicly available.


Introduction
Text summarization aims to provide accurate, concise, and useful information about the original inputs for users to fast browse.Existing generic summarization or aspect agnostic summarization methods (See et al., 2017;Narayan et al., 2018;Liu, 2019;Zhang et al., 2020;Liu et al., 2022;Wang et al., 2022b) typically generate only one summary for all different requests which is not optimal for diverse demands.It could fail to preserve the required information that the user needs or miss important details (Woodsend and Lapata, Figure 1: The left section titles are naturally adopted from the Wikipedia page to serve as different aspects, while the middle abstract is the head section serving as an overall summary of the article.The right part is the corresponding aspect-based summary. 2012; Angelidis and Lapata, 2018).By contrast, the aspect or query-based summarization methods (Xu and Lapata, 2020;Zhong et al., 2021;Ahuja et al., 2022) provide the flexibility of generating summaries for differentiated demands.
However, existing datasets for aspect-based summarization are either on a small scale (Wang et al., 2022a;Bahrainian et al., 2022a;Kulkarni et al., 2020), only focusing on a specific domain (Zhong et al., 2021;Zhan et al., 2022), or with limited aspects (Frermann and Klementiev, 2019;Hayashi et al., 2021).To the best of our knowledge, there is no existing dataset with millions of aspects and instances for large-scale open-domain aspect-based summarization.Models trained in a small-scale dataset with limited instances or aspects may fail to adapt to other aspects or domains in realistic open-domain scenarios.
To tackle the limitations of the existing aspectbased summarization datasets, we propose a largescale open-domain aspect-based summarization dataset named OASum.Table 1 compares OASum with seven existing datasets for aspect or querybased summarization.
To create the data, as illustrated in Fig. 1, we take advantage of crowd-sourcing knowledge in A stands for "Automatic", M stands for "Manual".#Input Tk. and #Output Tk. represent the number of input and output token lengths, respectively.#Asp.Type is the number of all aspect types.#Instances stands for the total number of (article, summary) pairs in the corresponding dataset.
English Wikipedia pages and parse them to collect the information on each page including the title of each section and its contents.On the one hand, the head section is a natural abstract of each Wikipedia page.On the other hand, the remaining sections describe different aspects of that page.Therefore, we use the section titles as the aspect inputs and apply a rule base process to automatically select sentences in the abstract section as the matched summary for different aspects.Specifically, we use the Wikipedia dump on 2022/06/21.It contains around 6.3 million pages after parsing.After preprocessing, we keep approximately 2 million pages that contain around 3.7 million instances in total.Our dataset includes 1,045,895 different aspects on 32,956 different domains (categorized with the original Wikipedia pages), providing plenty of useful information for open-domain aspect-based summarization.It also provides abstractive summaries that are not directly extracted from the original inputs.To ensure the quality, we perform a manual evaluation with randomly selected 66 pages, and the overall satisfaction score is 3.13 out of 5. Based on our curated million-level aspect-based summarization corpus, we pretrain Longformer-Encoder-Decoder (LED) (Beltagy et al., 2020) model on OASum in an endto-end way.Compared with the backbone model, our pretrained model achieves better performance on six out of seven downstream tasks for the finetuning and zero-shot settings and all six downstream tasks for the few-shot setting.
The contributions of our work are in two folds: • We create the first large-scale open-domain aspect-based summarization dataset namely OASum.The statistic shows OASum contains a variant of input lengths, highly abstractive summaries, and contents in a large number of aspects and domains.Overall, it contains more than 3.7M instances and 1M different aspect types.
• We further pre-train the backbone model on OASum and test the pretrained model with zero-shot, few-shot, and fine-tuning settings on seven downstream datasets.The results illustrate OASum provides useful information that can further benefit other query/aspectbased summarization tasks.

Related Works
Aspect / Query based Summarization.Aspectbased summarization was proposed to generate summaries based on different aspects for opinions and reviews (Kansal and Toshniwal, 2014;Wu et al., 2016;Akhtar et al., 2017;Angelidis and Lapata, 2018;Coavoux et al., 2019;Tan et al., 2020).Recent researches attempt to summarize different aspects for news (Frermann and Klementiev, 2019;Bahrainian et al., 2022a;Ahuja et al., 2022) and other domains (Hayashi et al., 2021;Zhan et al., 2022).Similarly, query-based summarization (Kulkarni et al., 2020;Zhong et al., 2021;Wang et al., 2022a) takes finer-grained questions as input for summarization.As our OASum contains even finer-grained aspects, we believe it can benefit both tasks.Wikipedia as data.Wikipedia has been widely used as a rich source for many NLP tasks, including Language Modeling (Guo et al., 2020), Question answering (Yang et al., 2015;Rajpurkar et al., 2018), Information Extraction (Wu and Weld, 2010), Dialogue (Dinan et al., 2018), and Summa-rization (Liu et al., 2018;Ghalandari et al., 2020;Sun et al., 2021;Iv et al., 2022).WikiAsp (Hayashi et al., 2021) directly uses external documents to generate the corresponding section contents with limited aspect types.Comparatively, OASum employs a matching method to obtain the aspect-based summaries from the head section of a Wikipedia page according to their similarities to the remaining page, resulting in more than one million aspect types.
As OASum has a large number of instances containing more than 4096 input tokens, we thus use LED (Beltagy et al., 2020) as our backbone model.

Dataset Construction
This dataset is built upon the observation that the abstract section is a natural summary for the later sections, and sentences in the abstract section may present one or more aspects described in the later sections.We use the English Wikipedia dump from 2022-06-20 for creating our dataset.Originally, there are over 6.33 million pages.Data Cleaning.Each Wikipedia page is written in a special markup language.We first adopt a tool2 (Pan et al., 2017) to remove all undesired markups (e.g., templates, internal/external links, and HTML tags) and keep section boundaries.Next, we discard structural sections including References, See also, External links, Further reading, and Bibliography.We further remove structural contents such as item lists in other sections.Finally, we split sentences using Spacy3 .We collect 3.75 million non-empty pages after data cleaning.Aspect Summaries Construction.An abstract sentence should be considered as a summary sen- tence of the specific aspect iff it has enough content overlap with the corresponding section.Shown in Algorithm 1, we first use a greedy method to map each abstract sentence to a list of sentences in the later sections.Then, we assign a matching score S(x, α) for each abstract sentence x and a potential aspect α.We use the ROUGE-1-recall between the abstract sentence x and the intersection of its mapped sentences M(x) and the sentences in the aspect section Y a .

S(x, a)
This score indicates the content overlap between the abstract sentence and the aspect section.To filter out sentences with limited content overlap, an aspect-based summary includes only abstract sentences with a matching score S(x, a) greater or equal to a pre-defined threshold λ.To determine the exact value of the threshold, we try λ ∈ [0.3, 0.4, 0.5, 0.6, 0.7] and evaluate them manually.Specifically, we randomly pick 66 Wikipedia pages consisting of 103 aspect-summary pairs for each threshold, and assigned them to 5 experts for evaluating the dataset quality.The Cohen's kappa between annotators is calculated to be 0.43, showing moderate agreement.The results are shown in Table 2.We then choose to use λ = 0.5.Data Splitting.We split the data into train/validation/test sets with 94%/3%/3% of the Wikipedia pages after data cleaning.After filtering out the instances where the summary is longer than the input text, we obtain 3,523,986/111,578/112,005 instances for the train/validation/test set.In Ta- ble 3, we demonstrate the aspect-based summaries constructed from the "Seattle" Wikipedia Page4 .

Data Statistics and Analysis
In this section, we demonstrate the properties of our dataset from different perspectives including the statistics of input and output length, abstractiveness, aspect distribution, and page ontology.
Length.On average, the input documents have 1,856.09tokens or 62.23 sentences, and the output summary contains 48.61 output tokens or 1.77 sentences.In Fig. 2, we further plot the length  distribution functions for both inputs and outputs.We find OASum contains a variety of lengths for both inputs and outputs.The inputs can range from 4 tokens to 78,498 tokens, while the outputs can range from 3 to 9,792.This creates a playground suitable for tackling long-tail problems that involve both lengthy inputs and extended summaries.In addition, the compression ratios of OASum are distributed widely from 0.685 to 32,148, which may promote the research of generating summaries with different granularity.Abstractiveness.We use novel n-gram ratios between the article and summary for measuring the abstractiveness of the summary.More than 15.96/59.45/81.00/89.68percent of unique 1/2/3/4grams have not appeared in the original input.This indicates the summary is highly abstractive.More- Geography: Seattle is situated on an isthmus between Puget Sound (an inlet of the Pacific Ocean) and Lake Washington.
Economy: A major gateway for trade with East Asia, Seattle is the fourth-largest port in North America in terms of container handling .Internet retailer Amazon was founded in Seattle in 1994, and major airline Alaska Airlines is based in SeaTac, Washington, serving Seattle's international airport, Seattle-Tacoma International Airport.
Culture: Between 1918 and 1951, nearly two dozen jazz nightclubs existed along Jackson Street, from the current Chinatown/International District to the Central District.The jazz scene nurtured the early careers of Ray Charles, Quincy Jones, Ernestine Anderson, and others.Seattle is also the birthplace of rock musician Jimi Hendrix, as well as the origin of the bands Nirvana, Pearl Jam, Soundgarden, Heart, Alice in Chains, Foo Fighters, and the alternative rock movement grunge.
Demographics: Today, Seattle has high populations of Native, Scandinavian, Asian American and African American people, as well as a thriving LGBT community that ranks sixth in the United States by population.
Table 3: Example of aspect-based summaries constructed from "Seattle".We only show part of the aspect summaries.tion, containing 447,589, 171,447, 69,266, 45,134, 43,398, 42,664, 36,199, 34,663, 34,057 and 33,424 instances, respectively.As shown in Fig. 5, we find that the top 30% aspect types cover 80.5% of all the cases, while the remaining 19.5% cases come from the other 70% aspects.This naturally provides open-domain and diverse multiple-aspects knowledge for aspect-based summarization.
Ontology.We analyze the domain distribution of our dataset using the ontology information provided by Wikidata's instance of (P31) prop- 4 Baselines and Analysis

Metrics and Models
In this section, we investigate the baseline models' performance over OASum.It includes heuristic methods(Heu), unsupervised methods, aspectagnostic extractive methods(Ext), and aspect-based abstractive methods(Abs).Our results are reported with ROUGE metrics (Lin, 2004), including ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-Lsum.We compare our system with extractive and abstractive summarization baselines.
ORACLE is generated by comparing the reference summary and each sentence in the document and obtaining the sentences with the best ROUGE scores in a greedy method (Liu and Lapata, 2019).RANDOM-N Random sentences are selected for the summary.We choose the same number of sentences in the reference summary.LEAD-N The leading sentences are known to be a good summary, especially in the news domain.We select the first N sentences as the summary.
SumBasic (Vanderwende et al., 2007) This method takes the frequently occurring words in a document cluster for the summary.
TextRank (Barrios et al., 2016) is a graph-based approach that computes connections between sentence importance based on significant words.KLSum (Haghighi and Vanderwende, 2009) is a greedy approach to adding a sentence to the summary by minimizing KL divergence.LEXRANK (Erkan and Radev, 2011) is similar to the TextRank but tries to alleviate the redundant information by reranking selected sentences.
Longformer-(base/large) is a supervised extractive method.As OASum contains long documents, we utilize the Longformer model to efficiently process the long sequence and the sentence-level Transformer layers for the sentence-level interactions.The oracle sentences are used as labels for predicting the best summary sentences.
LED-(base/large)-OASum. We adapt LED (Beltagy et al., 2020) for the aspect-based summarization task.We directly format the problem into an end-to-end sequence-to-sequence task and finetune the corresponding model over OASum.We prepend the aspect to the input document with a [BOS] token between them as the sequence input and use the corresponding summary as the sequence output.

Experiment Settings
We implement our code using pytorch-lightning7 and Huggingface Transformers8 .The inputs and outputs are truncated to a maximum of 4096/256 tokens.In Fig. 2, the selected maximum lengths can cover 88.6% of the entire input sequences and 99.1% of the entire output sequences.Since the input length is very long, we can only feed 4 instances to a single GPU for the base model and 2 instances for the large model.For speeding up the training, Distributed Data-Parallel and Automatic Mixed Precision (FP16) are used.Specifically, we utilize 64 NVIDIA V100 GPUs for base models and 128 NVIDIA V100 GPUs for large models for training both aspect-agnostic extractive models and aspect-based abstractive models.The gradient accumulation step is set to 8 for reducing the communication bandwidth.Therefore, the actual batch size is 2048.We use Fused-Adam (Kingma and Ba, 2015) implemented by NVIDIA-apex9 for the optimization.The initial learning rate is 1e − 4, and it linearly decreases to 0. The betas are 0.9 and 0.999 respectively.We do not apply warm-up for OASum training.Weight decay is 0.01.We evaluate the model 5 times per epoch on the validation set and pick the checkpoint with the highest average ROUGE-1/2/Lsum scores for testing.

Results & Analysis
In Table 4, we show the baseline model performance on OASum.The oracle performs the strong baseline and is used for the labels of Longformer models.It outperforms all extractive and abstractive methods except for the ROUGE-2 and ROUGE-L of the LED-large model.This indicates that the reference summary of OASum is more abstractive than extractive.The lead sentences perform similarly to the unsupervised baselines meaning that the important information is distributed to the beginning part of the documents but are not necessarily the best sentences as they under-perform the supervised methods.Random selection is the worst choice for the summary.For the supervised models, the extractive method outperforms the unsupervised methods but is outperformed by the abstractive methods by a large margin.We also include some generated good and bad examples as case studies in Appendix C.1.

Downstreams
To verify the knowledge inside OASum provides transfer ability, we further use the model pretrained on OASum for seven abstractive downstream datasets (see Appendix B.1) including three query-based summarization datasets and four aspect-based summarization datasets, across different domains.We test our model with zeroshot, few-shot, and fine-tuning abilities on these 7 datasets to see whether OASum can benefit the downstream tasks.In general, the model pretrained on the OASum outperforms the backbone model on 6 out of 7 tasks in the fine-tuning and zero-shot setting, 6 out of 6 tasks (w/o WikiAsp) in the few-shot setting. 1010 WikiAsp has 20 different subsets on different domains, we only perform the results for zero-shot and fine-tuning setting.

Experiment Settings
For all downstream tasks, we only test the base model to demonstrate the ability of our pretrained checkpoint in an end-to-end setting.We experiment with different decoding hyper-parameters and find the length_penalty = 1.0, num_beams = 4, and no_repeat_ngram s ize = 3 consistently achieve optimal performance on multiple datasets in the zero-shot setting.Thus, we keep these parameters for all downstream task experiments.For the backbone LED-base model (denoted as L ), we initialize the model using the checkpoint provided by (Beltagy et al., 2020) on Huggingface11 .On top of the backbone model, our model checkpoint is further fine-tuned on LED-OASum (denoted as O ) for 20 epochs.Notice that for fine-tuning and zeroshot scenarios, the Wikiasp results are reported on an average of 20 domains tested independently.

Fine-tuning Settings
For fine-tuning experiments, we directly fine-tune the model on the whole training set and report the ROUGE scores on the test set by selecting the best-performing checkpoint on the validation set.We present all the fine-tuning results in Table 5 with ROUGE-1, ROUGE-2, and ROUGE-Lsum scores.In general, models fine-tuned on our checkpoint consistently perform better and demonstrate a strong advantage in ROUGE scores.Appendix B.3 shows the complete results of the 20 domains of Wikiasp.We find that our fine-tuned models outperform the backbone model on most of the domains, with only a few exceptions.Overall, our experiments demonstrate that fine-tuning the backbone model on OASum is an effective approach for improving performance on a variety of aspects or query-based summarization tasks.

Models
Few-shot 0.3% Given the difficulty of gathering such data, we think our findings are beneficial across many disciplines.
In Table 16, we also show some typical examples.

Zero-shot Settings
For the zero-shot experiments, we only test the models on the whole test set without any optimization of the training data.The zero-shot evaluation results are demonstrated in Table 7.The complete results on 20 domains of Wikiasp are also shown in Table 13.As we can see, except for NEWTS datasets, our LED-OASum consistently achieves significantly better results in almost all evaluation metrics.We believe this improvement comes from the rich knowledge contained in the large corpus learned during the pre-training.The performance almost doubles on Wikiasp and AQuaMuse, validating that the knowledge is successfully transferred into the generation process.More case studies can be found in Table 15 and Table 16.

Conclusions
In summary, we contribute the first large-scale open-domain aspect-based summarization corpus collected using Wikipedia section titles as aspects by rules with good quality.Detailed statistics reveal many different aspects of the corpus, confirming its broader coverage.We also outline the methods we use for pre-training the generative language models and present abstractive and extractive results as a baseline for future work.Furthermore, we prove that our pre-trained model can consistently improve seven widely-used downstream tasks, especially in few-shot and zero-shot settings.We hope our data and pre-trained models can further foster relevant research in this area.

Limitations
First of all, our OASum inevitably contains inappropriate summaries not strongly correlated with certain aspects since it is automatically curated.
The model trained on it could furthermore hold such misinformation and affect other downstream tasks.But we hope the large-scale training can alleviate such effects to a minimum.At the current stage, we are not responsible for any products directly built on our results.In the future, a potential denoising mechanism could be designed to further reduce the noisy summaries.
Secondly, we only opt for end-to-end extraction, which requires large computational memory and cost that may not be afforded by everyone.Thus, a meaningful direction would be investigating other extract-then-summarize two-step methods for dealing with long document summarization.Besides, our vanilla dataset contains millions of summaries that are difficult for certain researchers with limited computational resources to directly reproduce results on.We recommend using a small subset of our corpus if enough computational capability is not immediately available.
Finally, we only explore a simple strategy for controlling the summarization based on input aspects.However, we find it can not always guarantee aspect-focused generation.How to efficiently and accurately generate specific summaries by confining aspects is not only challenging for model design but also difficult for humans to evaluate.We leave these issues for future work.

A Details in Data Statistics
A.1 Top 50 Aspects In Table 8, We show the most common 50 aspects in OASum and their frequencies.As we can see, those aspects naturally cover many perspectives of an article, serving as good and diverse aspects to be summarized with.

A.2 Top 50 Categories
In Table 9, We show the most common 50 categories of Wikipedia pages in OASum and their frequencies.In general, the top-50 and top-10% categories take up around 57.84%, and 93.51% of all the categories, respectively.

A.3 Bi-gram coverage and density
We notice that uni-gram coverage and density presented in the (Grusky et al., 2018) could only represent the token level extractiveness.However, summarizers typically extract self-contained (Cho et al., 2020) text spans to construct a summary.It usually works on sentence-level or sub-sentence level.In such cases, the token-level extractiveness cannot well represent how extractiveness the instance is.It becomes worse when the input document is long enough, containing different pieces of summary tokens in different places of the document.On the country, bi-gram coverage and density reduce the chance of wrongly representing the extractiveness of the instances.Thus, in this work, we choose to use bi-gram coverage and density for presenting the extractiveness / abstractiveness of instances.

B Details in Experiments B.1 Datasets
We list the 7 downstream datasets below, their statistics are shown in Table 1: ) is a Querybased multi-document summarization(qMDS) dataset built by automatically mining qMDS examples from question-answering datasets and large document corpora.We follow the preprocessing steps in (Vig et al., 2022)

B.2 Hyper-parameters
Fine-tuning.For downstream tasks, we finetune the model with 20 epochs on WikiAsp and 50 epochs on the remaining datasets.We then pick the checkpoint with the best validation average ROUGE performance to test its final performance on the testing data.In Table 10, we show the hyper-parameters used in the fine-tuning setting of different datasets.For decoding, we keep no_repeat_ngrams as 3, the beam size is set to 4, and the length penalty is set to 1.0.We use a linearly decreasing learning rate schedule on all tasks without any warm-up.The weight decay is set to 0.01.Zero/Few-shot.In Table 11, we show the hyperparameters used for zero/few-shot settings where no_repeat_ngrams is kept at 3/0, the beam size is 4/1, and the length penalty is always set to 1.0.We only use early_stopping for zero-shot.The epochs and learning rate for few-shot training are always 60 and 2e-5 respectively.warm-up rates are set to 0.05, while weight decay is 0.01.Batch size is 2 for 0.3%, while 4 for 1% and 3% scenarios.In Table 12, we also show the exact number of instances used for few-shot training.The total number of picked training instances ranges from less than ten to several hundred.

B.3 WikiAsp Full Results
In Table 13 For example, for Cultural influence in Pokémon, the generated summary is coherent, fluent, and "correct", but not related to this specific aspect at all.For Demographics in Shanghai, the first half of the summary is focused on Demographics, but the remaining description the capital of the province of Zhejiang is both unrelated and inaccurate.

C.2 LED-OASum Examples
Zero-shot.Here we show three examples from downstream AQuaMuse, QMSum, and NEWTS datasets under the zero-shot setting in Table 15.
As we can see from the results of AQuaMuse and QMSum, LED-OASum can produce much better summaries.For another example from NEWTS, although LED-base achieves higher rouge scores, the summary is actually redundant and repetitive.On the contrary, the LED-OASum generated summary(highlighted in green) preserves the summary towards the chosen aspect and demonstrates good quality.
Few-shot.Besides, we also show one example from SQuALITY dataset under few/zero-shot setting in Table 16.Under the zero-shot conditions, our LED-OASum can generate a much better query-based summary than the original LED-base model, which can also be observed from ROUGE scores.When the models are furthermore tuned on a small amount of 3% (30) of training instances, the improvements mainly come from ROUGE-L and ROUGE-LSum.: 37.12,'rouge2': 9.31,'rougeL': 16.32,'rougeLsum': 35.52 GOLD: The Ghost Ship is a legend that sailors and space travelers alike have claimed people see in the moments before they die at sea or in space.In the seconds before Dobbin dies in Willard's arms, he looks out the window of the "Mary Lou" and claims to see the Ghost Ship himself.Throughout Willard's long periods of solitude aboard the "Mary Lou", he thinks he sees the Ghost Ship several times.First, from a distance, as a blinking light advancing closer and closer before turning back and sailing off into dark space; later, he thinks he sees the ship return, only this time it passes nearer before turning back and leaving again.

Domain
With each return of the Ghost Ship, Willard believes he sees it clearer than he had before.After decades adrift in space, Willard believes a ship has finally come to rescue him.He does not think it is the Ghost Ship because it is solid, and he is greeted by a crew of people.However, the captain explains that the longer a vessel spends lost in space, the more it loses itself and slips into a kind of un-reality, along with those aboard.The more the "Mary Lou" drifted into this space, the more real the Ghost Ship became to Willard.Willard realizes that the "Mary Lou" has become a "ghost ship" herself.
Few-shot: Query in SQuALITY test set: What is the significance of Ghost Ships in the story?LED-OASum: Ghost Ships are ghost-like ships that haunt the Solar System that gave it birth.They are the home of spacemen who can never go home again.When your last measure of fuel is burnt and your ship becomes a lifeless hulk-the Ghost will come-for you.Space Man Willard has a friend who died in space, J. Willard II, the son of the lost explorer.He is planning to manufacture a super-size exploration ship to be called Mary Lou II, in memory of his father.Each night he sees the sight of the ship and thinks he will see it again.However, after several dreams and visions of the ghost ship haunting him, it is clear that the ship is not a phantom ship.Instead, a life and blood rocket ship from Earth is tormenting him once again.Starlight shines on it and not through it!Its lines, window, vents were all solid and had none of the spirit-like quality he remembered in the Ghost Ship in his youth.He yawns and sighs as he remembers the days he had spent on Earth.Although Willard believed that the ghost ships were superstitions, he saw them from his ship, and he tried to rip the memories of Earth out.'rouge1 ': 45.80,'rouge2': 10.48,'rougeL': 21.31,'rougeLsum': 42.63 LED-base: What is the significance of Ghost Ships in the story?GALACTIC GHOSTS IN SPACEWillard is the captain of the Mary Lou ship.He is also a friend of Dobbin, a scientist who had come to explore the Solar System.He knows that Dobbin has died a horrible death, and the stories that he tells about the ghost ship that he has read about in the newspaper.He also knows that the unnamed unnamed ship which came to capture Dobbin's attention is a ghost ship.However, despite these beliefs, Willard does not believe that the ship has actually landed on the planet and is actually traveling to see Dobbin again.For a moment the reason in his mind tottered on a balance.Doubt assailed him.Was this the Ghost Ship come to torment him again?But no, it does not.He is greeted by a small crowd of people and his room is filled with empty rooms.The ghost ship seems to have abandoned its passengers and crew members, leaving Willard alone in his room.He does not understand that his friends have just landed their ship here and are experiencing the land for the first time.perspectemaleHe is taken aback by the large crowd that has gathered around him, and he begins to imagine that his ship has landed on Earth.He would see Earth again!That single thought runs through his mind constantly.The tapping of the space-telegrapher reassured him.He hears a message from the ship that says "CALLING SPACE SHIP MARY LOU," the message rapped out, "Yes, that is it!"With trembling fingers that he could scarcely control, old Willard sent the answering message.It is considered to be the most important message of the story.'rouge1 ': 44.23,'rouge2': 13.13,'rougeL': 20.77,'rougeLsum': 41.93 Table 16: Examples of aspect-based summaries under zero/few-shot setting.Few-shot means the model is finetuned on randomly chosen 3% samples from the training set.

Figure 2 :
Figure 2: Input (Top) and output (Bottom) length in terms of tokens with Probability Density Functions (Left) and Cumulative Distribution Functions (Right).The red dashed lines represent the truncation we used for model training.L and P represent the token length and cumulative probability, respectively.

Figure 5 :
Figure 5: Cumulative proportion of aspect distribution.The horizontal axis represents the sorted aspects from high frequency to low frequency.

Figure 6 :
Figure 6: Word cloud based on the top 400 categories drawn from the first-level category names in OASum.Word size is proportional to the word count.The size of the dominant category human is reduced 10 times in corresponding to the whole category set.

Table 1 :
Type Dataset Domain #Instances #Input Tk. #Output Tk. #Asp.Type Method Statistics of query/aspect-based summarization datasets.The last column contains the methods of dataset creation.

Table 2 :
Summary quality with different thresholds.The scores are in the range of 1-5, representing very bad, bad, fair, good, and excellent, respectively.
Seattle is a seaport city on the West Coast of the United States.It is the seat of King County, Washington.The Seattle area was inhabited by Native Americans for at least 4,000 years before the first permanent European settlers.Arthur A. Denny and his group of travelers, subsequently known as the Denny Party, arrived from Illinois via Portland, Oregon, on the schooner "Exact" at Alki Point on November 13, 1851.The settlement was moved to the eastern shore of Elliott Bay and named "Seattle" in 1852, in honor of Chief Siáhl of the local Duwamish and Suquamish tribes.Growth after World War II was partially due to the local Boeing company, which established Seattle as a center for aircraft manufacturing.The Seattle area developed into a technology center from the 1980s onwards with companies like Microsoft becoming established in the region; Microsoft founder Bill Gates is a Seattleite by birth.The stream of new software, biotechnology, and Internet companies led to an economic revival, which increased the cityś population by almost 50,000 between 1990 and 2000.Seattle also has a significant musical history. History:

Table 4 :
Baseline results on OASum test set.Y and N mean including aspect or not.

Table 5 :
Fine-tuning results on downstream tasks.Wikiasp results are the average number of all 20 domains.L represents LED-base, O represents LED-OASum-base.

Table 7 :
Zero-shot results on downstream tasks.Wikiasp results are the average number on all 20 domains.L represents LED-base, O represents LED-OASum-base.
deep bidirectional transformers for language understanding.In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota.Association for Computational Linguistics.

Table 8 :
The most common 50 aspects and their frequencies.

Table 9 :
The most common 50 categories, the corresponding Wikidata IDs, and their frequencies.unincorporated., aspect., city., town., association., television.andcensus.areshort for unincorporated community in the United States, aspect in a geographic region, city/town of the United States, association football club, television series episode, census-designated place, respectively.MA-News (Frermann and Klementiev, 2019) synthesize multi-aspect summaries by interleaving paragraphs of n d documents belonging to different aspects and pairing the document with each of its n d components' reference summaries.

Table 11 :
Zero-shot hyper-parameters parameters.#Mai, #Mio and #Mao represent Max input length, Min output length and Max output length, respectively.
, we present allWikiAsp (Hayashi et al.,  2021)20 domains results with fine-tuning and zeroshot settings.It is obvious that our LED-OASum consistently achieves near-double performance for all domains under almost all ROUGE metrics.The improvements over finetuning results are less substantial but still preserve more than 0.5 points improvements.We attribute this advance comes from the rich knowledge contained in our OASum corpus.It is worth noting that the inputs of our OA-Sum are close to the outputs of Wikiasp, but we are not sure whether the information seen during our training in encoding has direct help for tuning wikiasp in the decoding stage.OASum Examples Here we show two examples of Wikipedia pages Pokémon 13 and Shanghai 14 from OASum test set in Table14.The aspect-based summary results are generated by our trained LED-OASum checkpoint.

Table 13 :
Finetuing and zero-shot performance on Wikiasp datasets.Zero-shot: Query in SQuALITY test set: What is the significance of Ghost Ships in the story?LED-OASum: Space Man Willard is a short story by American writer James Willard, published in 1942.The story is about a space ship that Willard had never seen before.The ship was lost for twenty years, and Willard was the only survivor of the Mary Lou, which had been destroyed by a meteor.Willard's ship, the "Mary Lou II", was the first space ship to sail beyond the Solar System, and the only ship to have been lost for over a thousand years.The Mary Lou was named in honor of his father, who died when the ship was still in space.It was the last ship Willard would see before his ship would be able to return to Earth.The ghost of the ship would come back to him, and he would never see it again.It would be the last time he would see Earth again, and it would be too late for him to see the ship again.He would spend the next twenty years of his life in space alone, and would never return to the Earth again.In the story, Willard imagines that the ghost ship will come to Earth and bring him home.He imagines the ship will be the home of spacemen who can never go home again.And this is what happened to.What is the significance of Ghost Ships in the story?Willard and Larry Dobbin left the Rocket Port for the exploration of an alleged planetoid beyond Pluto.The ship has not been seen or heard from since.J. Willard, II, son of the lost explorer, is planning the manufacture of a super-size exploration ship to be called Mary Lou II, in memory of his father."Chorus TextColor TextColor He awoke with a start and a cry of alarm ran through him as he thought that perhaps he might still be in the Mary Lou.The warm, smiling face of a man quickly reassured him.Chorus FontSize TextColor "I'll call the captain," the space man said."He said to let him know when you came to."Chorus of TextColor The ghost ship was a ghost ship.It was the ghost ship, the ghost of a ghost.The ghost ships were ghost ships, ghost ships and ghost ships.The Ghost Ship was ghost ships of ghost ships in the Ghost Ship in his youth.chorusescription TextColor It was ghost ship of ghost ship TextColor When Willard awoke, he could not sleep.And so it was that each night-for Willard did not give up the Earth-habit of keeping time-Willard dreamed of the days he had known on Earth.chorus TextColor A ghost ship in the sky."Nonsense!"Willard broke in hurriedly, hoping that the dying man would not see through the lie."We've got the sun's gravity helping us drift back to Earth! We'll be there soon!You'll get well soon and we'll start to work again on a new idea of mine...." His voice trailed helplessly away and the words were lost.He was no longer able to sleep.Chorus of the Ghost ship in space TextColor Chorus Of the ghost ships In the sky, he thought, "It's all right," Willard whispered.The sick man did not hear him.Two tears rolled down his cheeks.His face contorted as he tried to withhold a sob.Chrome TextColor ITextColor "How do you feel, Space Man Willard?" Chorus, chorus of ghosts TextColor chorus chorus "Oh, you know me?" Willard looked at him in surprise, and then smiled, "I don't know you."chorusOf TextColor Darkness TextColor There was nothing to see.Darkness -Darkness Archdemon Darkness Gleaming Darkness 'rouge1'