Controlling keywords and their positions in text generation

One of the challenges in text generation is to control text generation as intended by the user. Previous studies proposed specifying the keywords that should be included in the generated text. However, this approach is insufficient to generate text that reflect the user’s intent. For example, placing an important keyword at the beginning of the text would help attract the reader’s attention; however, existing methods do not enable such flexible control. In this paper, we tackle a novel task of controlling not only keywords but also the position of each keyword in the text generation. To this end, we propose a task-independent method that uses special tokens to control the relative position of keywords. Experimental results on summarization and story generation tasks show that the proposed method can control keywords and their positions. The experimental results also demonstrate that controlling the keyword positions can generate summary texts that are closer to the user’s intent than baseline.


Introduction
One of the challenges in text generation is to generate text that is consistent with user's intent.Many methods have been proposed to specify the keywords that should be included in the generated text to reflect the user intent.In summarization, by providing the model with keywords that should be included in the summary, it is possible to generate summaries that focuses on specific parts of the document (Fan et al., 2018;He et al., 2020;Dou et al., 2021).In story generation, keywords are used to control the narrative storyline (Jain et al., 2017;Fan et al., 2019;Yao et al., 2019).In other tasks such as e-commerce generation, review generation, and question generation, keywords are also used to control text generation (Chan et al., 2019;Shao et al., 2021;Ni and McAuley, 2018;Chan et al., 2021;Zhang and Zhu, 2021).In addition, there are more advanced methods that specify the order of keywords to be included to control the rough storyline of the generated text (Su et al., 2021;Shao et al., 2021).
However, these methods cannot generate texts that reflect more fine-grained intentions.To ensure the generated text reflects the intended importance of keywords, it may be necessary to adjust their position within the text.For example, important keywords such as topic words and eye-catching words can be placed beginning in the text to attract the reader's attention, while the keywords for supplementary information can be placed middle or later in the text.Controlling the specific position of keywords in the generated text is a challenge in reflecting more specific user intentions and generating texts that attract readers.However, as far as we know, no previous work has tackled this.
In this study we address a novel task of controlling keywords and the position of each keyword in text generation.We propose a task-independent method that uses special tokens to control text generation, inspired by previous work that controlled text attributes by using special tokens (Iwama and Kano, 2019;Lakew et al., 2019;Martin et al., 2020).Specifically, we specify where the keyword placed by providing the model with a special token which represent the target relative position of the keyword (e.g., 0-10%, 10-20%) and target text length (e.g., 20-24 words, 25-29 words).The reason for using relative positions rather than absolute positions is that it is more practical to specify relative positions such as early, middle, or latter of the text.The reason for also controlling the length of target text is that the text length is considered to be the important factors that users want to control when considering where to place keywords.During training, we provide the model with control tokens, including keywords randomly extracted from the target text, the positions of each keyword, and the length of the target text.The model is trained with cross-entropy loss as in conventional text generation, which enables the model to learn the correspondence between the input control tokens and the target text.
We perform a comprehensive evaluation on summarization and story generation tasks.First, we show that our method can control keywords and their positions in both tasks (Section 3.2).Second, we also demonstrate that our method can generate summary texts that are more similar to the gold summary than the baseline, indicating that the text closer to the user's intent can be generated (Section 3.3).We show through case studies that a model specifying keyword position control can reflect the user fine-grained intention (Section 3.4).

Model
We use a BART model (Lewis et al., 2020) for the summarization task and a GPT model (Radford et al., 2018) for the story generation task.When using the BART model, the source document are combined with the control tokens: (1) keywords in the text to be generated, (2) positions of each keyword, and (3) the length of text to be generated, and given to the encoder as shown in Figure 1.When using the GPT model, these control tokens are given to the decoder.As with regular text generation using GPT and BART, the model is trained to maximize the conditional probabilities p(y i |y <i , x) by using the cross-entropy loss, where y denotes the target text and x denote the input to the model, including the control tokens and the source document in summarization task.

Control tokens
Inspired by existing work that control text attributes by special tokens (Iwama and Kano, 2019;Lakew et al., 2019;Martin et al., 2020), we provide the model with the position of each keyword and the text length as special tokens.For example, if the keyword phrase "two dogs" is located in the range of 20-30% of the text and the text length is in the range of 50-54 words, the model will be given "[LENGTH50][SEP]two dogs[POSITION20]" as the control token.[LENGTH50] and [POSITION20] are new tokens added to the vocabulary and the corresponding embedding is initialized randomly.
Note that in this study, control tokens that represent the oracle information of the target text are fed to the model both during training and inference.This experimental setting is appropriate because the goal of this study is to generate the text intended by feeding additional information to the model.A direction in which the model automatically determines keywords and their positions (i.e., control tokens are not given to the model) is also possible, but we leave this to the future work.
We extract control tokens from the target text as follows.Details are also given in Appendix A.3.Keywords Keywords in this paper are not limited to important words in the target text but mean any phrase consisting of one to three consecutive words in the target text.For example, from the text "Marcia was looking forward to trying hang gliding.", the phrases "Marsha", "was", "looking forward", "to trying", "trying hang gliding", etc. are first extracted as keyword candidates.However, frequent words with little meaning such as "was" and "to trying" are excluded from the keyword candidates, because they are considered unlikely to be given as keywords by the user.During training, a random number of phrases from the keyword candidates are given to the model as keywords.

Keyword Position
The position of each keyword is expressed as a relative position.Specifically, the absolute position of the target keyword when counted from the beginning of the text is divided by the number of words in the entire text and quantized positions in units of 10% are given to the model.The reason for using relative position rather than absolute position is that it is more practical to specify relative positions such as early, middle, and latter of the text.Text Length We feed the model the number of words in the target text quantized in 5-word units.The reason for also controlling the length of target text is that the text length is considered to be the important factors that users want to control when considering where to place keywords.In addition, we hypothesize that it is difficult for a model to determine the specific position at which to place keywords based on relative position alone, and the control performance can be improved by providing length information together.

Experiment setting
We perform a comprehensive evaluation on wellestablished summarization and story generation tasks.These two tasks have different characteristics.(1) In the summarization task, the model extracts information from a source document and compresses it into a short text based on the given control tokens.(2) In the story generation task, the model generates a text based on of solely given control tokens.For a summarization task, we used the CNN/DailyMail (Hermann et al., 2015) and the XSum (Narayan et al., 2018) dataset and the BART LARGE model (400M parameters) (Lewis et al., 2020).For a story generation task, we used the ROCStories (Mostafazadeh et al., 2016) dataset and the GPT2 model (120M parameters) (Radford et al., 2018).In all experiments, training and inference were performed three times, and the mean score is reported.See Appendix A for more details on the experimental setup.

Evaluation of keyword position control
First, we check whether the given keywords are placed at given positions.We evaluate the accuracy of generating text including all target keywords and the accuracy of generating text in which all target keywords are placed in each target position.
Table 1 indicates that our method using special tokens (+Pos and +Pos+Len) can generate text that includes the desired keyword at the desired position.Providing text length information along with position information (+Pos+Len) improves the accuracy of keyword position control, particularly in datasets with long text lengths (CNN/DM and ROCStories).That's because combining relative position and length information enables the model to place the keywords in appropriate positions.The accuracy of the keyword inclusion also improves when the keyword position are given.We suspect that the model was informed in advance of where the keywords should be placed, preventing the model from forgetting to place keywords in the text.We can see that the control accuracy is much lower in the story generation task compared to the summarization task.This may be because the model is not given the source document and generates a text from condition tokens only, which makes it more likely to generate the inappropriate context for keyword inclusion.
We present a more detailed analysis in Table 2. Here, we check whether a text was generated with the keywords in the correct position, and if not, to what extent the keywords were misplaced or not included in the text.At all target positions, the accuracy of the keyword position control is improved compared with using keyword-only control, suggesting the effectiveness of our approach.The closer to the beginning of the text, the higher the success rate of keyword inclusion and positional control, indicating that the model is better at placing keywords earlier in the text.

Evaluation of summary content control
We show that controlling the text makes it easier for the user to generate the intended text in the summarization task.Table 4 shows the results of the summarization evaluation by the ROUGE score (Lin, 2004).Note that we exclude target keywords from both the target and generated summaries to reduce the effect on the ROUGE score due to giving target keywords.We can see that the score is improved Target keyword position (relative position) in the generated summary 0-10% 10-20% 20-30% 30-40% 40-50% 50-60% 60-70% 70-80% Table 2: Detailed evaluation of the control of the keyword and its position in the CNN/DM dataset.We classified, for each target relative position, whether the text was generated with the keyword in the target position (Correct position), with a positional deviation within 10% (Within 10% diff), with a positional deviation greater than 10% (Over 10% diff), or the keyword not in the text (Not included).
Keyword & Position Generated summary true miracle dog (0%) "She's a true miracle dog and she deserves a good life," foster mother says.Theia was apparently hit by a car and buried in a field.Four days later, she was found emaciated and dirt-covered by a farm worker.A fundraising page has raised more than $10,000.
true miracle dog (40%) A stray dog apparently hit by a car and buried in a field survives.The dog, named Theia, is "a true miracle dog," her foster owner says.She suffered a dislocated jaw, leg injuries and a caved-in sinus cavity.A fundraising page has raised more than $10,000.
true miracle dog (90%) Stray dog apparently hit by a car and buried in a field survives.Four days later, she is found emaciated and dirt-covered on a farm.She has a dislocated jaw, leg injuries and a caved-in sinus cavity.A good Samaritan has set up a fundraising page for "the true miracle dog" hit by (40%) bully breed mix (60%) A stray dog in Washington State apparently died after being hit by a car.Four days later, she was found emaciated and dirt-covered in a field.Theia, a bully breed mix, has a dislocated jaw, leg injuries and a caved-in sinus cavity.hit by (10%) bully breed mix (0%) Theia, a bully breed mix, was apparently hit by a car and buried in a field.Four days later, she was found emaciated and dirt-covered by a farm worker.She suffered a dislocated jaw, leg injuries and a caved-in sinus cavity.A fundraising page has raised more than $10,000.Keyword & Position Generated summary (Error cases) buried in (80%) Stray dog apparently hit by a car, apparently killed and buried in a field.Four days later, the dog manages to stagger to a nearby farm.She was found by a farm worker, who took her to a vet for help.Theia is now receiving treatment.

whacked (70%)
A stray dog in Washington State apparently died after being hit by a car.Four days later, she was found emaciated and dirt-covered by a farm worker.She suffered a dislocated jaw, leg injuries and a caved-in sinus cavity.A fundraising page has raised more than $10,000.by controlling the keyword positions and the text length, indicating that providing these additional information makes it easier to generate text that is close to the user's intended content.

Case study
To better understand how the model behaved, we show generated examples in Table 3.We can control the keyword and their positions, although cases exist in which the position of a keyword deviates slightly from the target position.We also see that by giving different positions for the keywords, we can generate several valid texts with different characteristics.For example, placing the keyword "true miracle dog" at the 0% position generates a text that draws the reader's attention with a eye-catching keyword at the beginning of the text.In contrast, placing that keyword at the 90% position generates a narrative-style text that describes events in chronological order.We also see that even when multiple keywords are given, the order of the keywords can be adjusted by controlling the position of each keyword.
We also show some error cases.When a keyword position near the end of the text is specified, the instruction is often ignored and the keyword is placed in a completely different position or not included in the text.As can be seen from the results in Table 2, the model tends to be poor at placing keywords at the back of the text.One reason may be that the closer to the end of the text, the more inappropriate the context may be generated for keyword inclusion.We leave a more detailed analysis for future work.We show additional generated examples in Appendix D.

Conclusion
In this paper, we tackled a novel task of controlling keywords and the position of each keyword in text generation.We experimented with the summarization task and the story generation task.We showed that a task-independent method can control the keyword positions in both tasks.We also showed that our method can generate summary texts that are more similar to the gold summary, indicating that the text close to the user's intent can be generated.

A Experiment setting details
A.1 Hyper-parameter Adam (Kingma and Ba, 2015) was used as the optimizer, with β 1 = 0.9, β 2 = 0.999, = 10 −6 and L2 regularization factor 0.01.The learning rate was warmed up in the first 6% of the training steps and then was decayed linearly.The dropout rate was 0.1.A batch size was 32.The label smoothing (Szegedy et al., 2016) was 0.1.

Summarization task
The learning rate was set to 2 × 10 −5 for both models.However, since the weights of the newly added special tokens were trained from the initialization state, the learning rate for the word embedding weights was set to a larger learning rate: 1 × 10 −3 .The number of epochs was 10.During inference, the summary text was generated using a beam search.For the CNN/DM dataset, the number of beams was 4 and the length penalty was 2.0.For the XSum dataset, the number of beams was 6 and the length penalty was 1.0.The maximum number of tokens for the source document was set to 1024 and the maximum number of tokens for the summary text was set to 128, and any text with a higher number of tokens was truncated at the end.

Story generation task
The learning rate is set to 2 × 10 −5 , and the learning rate for the word embedding weights was set to 1 × 10 −3 .The number of epochs is 30.During generation, texts were generated top-p sampling where p = 0.95.The temperature was set to 0.1.The maximum number of tokens for the target text was set to 128, and any text with a higher number of tokens was truncated at the end.

A.3 Dataset and Control tokens
The statistics of the dataset are shown in Table 5.
For the CNN/DM dataset, we followed the data split proposed by Yoon et al. (2020).For the XSum dataset, we followed the data split proposed by Narayan et al. (2018).For ROCStories, we split the data into 8:1:1 for training, development, and test sets.Control tokens (keywords, each keyword position, and the text length) were extracted from the target text and given to the model for training.Word tokenization of the text was done by NLTK library2 to obtain keywords and text length.Note that the model receives the text tokenized into subwords.Therefore, the number of tokens the model receives differ from the pre-calculated text length.
In training, the model is provided with all control tokens (keywords, keyword position, and text length) each with a certain probability.We use this trained model to generate texts under four different settings (Keyword, +Len, +Pos, +Len+Pos 1), model training and inference were performed separately from the model described above.
Keywords A sequence consisting of one to three consecutive words was obtained from the target text as the keyword candidates.Phrases whose first word is a stop word or a frequent word, were excluded from the keyword candidates because they are considered unlikely to be given as keywords by the user.During training, from zero to three keywords were randomly selected from the keyword candidates and given to the model for each epoch.
During inference, one to three keywords were given to the model in the experiment of Table 1, and one keyword was given to the model in the experiment of Table 2 and Table 4.In both training and inference, one keyword is not a subsequence of another keyword.

Keyword Positions
We obtained the relative position of the above keywords in the target text and fed it with the model.However, for each keyword with a probability of 10%, the keyword position was not given to the model and only the keyword was used during training.
Text Length We obtained the word length of the target text and gave it to the model.Also, with a probability of 10%, the word length of the text was not given to the model during training.

B Generating diverse texts
Here, we show that by controlling keyword positions, we can generate a variety of texts from a specific keyword.The ability to generate diverse texts will enable users to select their intended text from among multiple generated texts.When normal (w/o Control) and with keyword generation, we generate 10 different texts from a single input3 .When using keyword positions, we provide the model with 10 different keyword positions for one particular keyword and generate 10 different texts.For each generated text, the diversity is evaluated with the Self-BLEU (Zhu et al., 2018) score.The results in Table 6 show that the generated text diversity is improved by providing a variety of keyword positions.In particular, we used the beam search to generate multiple texts in the summarization task, which resulted in very low diversity in the generated texts, but this reduction in diversity was mitigated by controlling the position.The generated examples in Table 3 also show that controlling position produces a variety of valid texts from a particular keyword.

C Generation specifying random positions
In section 3.2, we showed that we can generate texts by specifying oracle keyword positions extracted from target texts.We also show that we can generate text by specifying arbitrary keyword positions.Specifically, we evaluate the accuracy of position control when oracle keyword positions are given and randomly selected keyword positions are given.Note that the keywords given to the model are the single oracle keyword extracted from the target text, and are same keywords in the both setting.
Table 7 shows that position control is still possible when a random position is specified.However, when specifying random positions, the accuracy of keyword inclusion is slightly lower and the accuracy of keyword position control is significantly lower.That's because keyword positions that are difficult to place can be specified.For example, a keyword originally used at the end of text is difficult to place early in the text.

D Generated samples
We show some samples of texts generated by our method in Table 8, Table 9, and Table 10.

E Limitations
Depending on oracle information In our study, text generation is controlled by providing the model with control tokens extracted from the target text.The accuracy improvement of the keyword inclusion and the position control in our experiments is due to this additional information, not to the improved performance of the model itself.Because the goal of this study is to enable users to control the model by providing additional information such as keywords and positions, this design is not a mistake.However, selecting appropriate keywords and placing those keywords in the appropriate positions without relying on oracle information is one of the challenges for the future.
Depending on length information The reason for using relative position instead of absolute position is that we believe that there are few situations in which the user wants to specify the specific absolute position of a keyword, and it is more practical to control the relative positions of the keywords.Our method requires that the model be given a target text length, which may impose an extra burden on the user in practical terms.Experimental results showed that length information itself is not essential for relative position control, but it is one key to improving performance.Lee et al. (2018) proposed a method for predicting the target text length from the source document in machine translation.By incorporating this method, it may be possible to control the relative positions of keywords without providing additional length information.

Insufficient performance
The experiment results in Table 1 show that the accuracy of the keyword inclusion and the keyword position control is low, especially in story generation.The reason for this may be that the model does not generate the appropriate context for the inclusion of keywords because the source document is not given.In the summarization task, the accuracy of the keyword position control is also far from perfect.Since it is possible to extract only desirable text from the generated text, the control need not necessarily succeed 100% of the time.However, if the success rate of the control improves, the efficiency of generation will improve.A deeper investigation of the cause of poor performance and the control accuracy improvement are challenges for future work.One idea to improve performance of story generation is to give the model several words at the beginning of the text, which may make it easier for the model to generate the appropriate context.
Source document Never mind cats having nine lives.A stray pooch in Washington State has used up at least three of her own after being hit by a car, apparently whacked on the head with a hammer in a misguided mercy killing and then buried in a field -only to survive.Thatś according to Washington State University, where the dog -a friendly white-and-black bully breed mix now named Theiahas been receiving care at the Veterinary Teaching Hospital.Four days after her apparent death, the dog managed to stagger to a nearby farm, dirt-covered and emaciated, where she was found by a worker who took her to a vet for help.She was taken in by Moses Lake, Washington, resident Sara Mellado."Considering everything that sheś been through, sheś incredibly gentle and loving," Mellado said, according to WSU News."Sheś a true miracle dog and she deserves a good life."Theia is only one year old but the dogś brush with death did not leave her unscathed.She suffered a dislocated jaw, leg injuries and a caved-in sinus cavity -and still requires surgery to help her breathe.The veterinary hospitalś Good Samaritan Fund committee awarded some money to help pay for the dogś treatment, but Mellado has set up a fundraising page to help meet the remaining cost of the dogś care.Sheś also created a Facebook page to keep supporters updated.Donors have already surpassed the $10,000 target, inspired by Theiaś tale of survival against the odds.On the fundraising page, Mellado writes, "She is in desperate need of extensive medical procedures to fix her nasal damage and reset her jaw.I agreed to foster her until she finally found a loving home."She is dedicated to making sure Theia gets the medical attention she needs, Mellado adds, and wants to "make sure she gets placed in a family where this will never happen to her again!"Any additional funds raised will be "paid forward" to help other animals.Theia is not the only animal to apparently rise from the grave in recent weeks.A cat in Tampa, Florida, found seemingly dead after he was hit by a car in January, showed up alive in a neighborś yard five days after he was buried by his owner.The cat was in bad shape, with maggots covering open wounds on his body and a ruined left eye, but remarkably survived with the help of treatment from the Humane Society.
Gold summary Theia, a bully breed mix, was apparently hit by a car, whacked with a hammer and buried in a field."Sheś a true miracle dog and she deserves a good life," says Sara Mellado, who is looking for a home for Theia.Keyword Generated summary w/o Control A dog apparently hit by a car and buried in a field appears to have survived.Four days after her apparent death, the dog stagger to a nearby farm.She was found by a worker who took her to a vet for help.A fundraising page has raised more than $10,000 for the dog's care.
true miracle dog (w/o Position Control) Stray dog apparently hit by a car and buried in a field survives.She was found emaciated and dirtcovered by a farm worker four days later.The dog, named Theia, has a dislocated jaw, leg injuries and a caved-in sinus cavity."She's a true miracle dog and she deserves a good life," a foster mother says.true miracle dog (0%) "She's a true miracle dog and she deserves a good life," foster mother says.Theia was apparently hit by a car and buried in a field.Four days later, she was found emaciated and dirt-covered by a farm worker.A fundraising page has raised more than $10,000.

true miracle dog (40%)
A stray dog apparently hit by a car and buried in a field survives.The dog, named Theia, is "a true miracle dog," her foster owner says.She suffered a dislocated jaw, leg injuries and a caved-in sinus cavity.A fundraising page has raised more than $10,000.true miracle dog (90%) Stray dog apparently hit by a car and buried in a field survives.Four days later, she is found emaciated and dirt-covered on a farm.She has a dislocated jaw, leg injuries and a caved-in sinus cavity.A good Samaritan has set up a fundraising page for "the true miracle dog" hit by (40%) bully breed mix (60%) A stray dog in Washington State apparently died after being hit by a car.Four days later, she was found emaciated and dirt-covered in a field.Theia, a bully breed mix, has a dislocated jaw, leg injuries and a caved-in sinus cavity.
hit by (10%) bully breed mix (0%) Theia, a bully breed mix, was apparently hit by a car and buried in a field.Four days later, she was found emaciated and dirt-covered by a farm worker.She suffered a dislocated jaw, leg injuries and a caved-in sinus cavity.A fundraising page has raised more than $10,000.apparently (10%) field (40%) miracle dog (60%) A stray dog in Washington State apparently died after being hit by a car.Four days later, the dog was found in a field, dirt-covered and emaciated."She's a true miracle dog and she deserves a good life," the dog's foster owner says.buried in (80%) Stray dog apparently hit by a car, apparently killed and buried in a field.Four days later, the dog manages to stagger to a nearby farm.She was found by a farm worker, who took her to a vet for help.Theia is now receiving treatment.

whacked (70%)
A stray dog in Washington State apparently died after being hit by a car.Four days later, she was found emaciated and dirt-covered by a farm worker.She suffered a dislocated jaw, leg injuries and a caved-in sinus cavity.A fundraising page has raised more than $10,000.Table 10: Examples of texts generated from the ROCStories dataset.As can be seen, even if the keyword "saving money" is given, for example, it may be used as another expression, such as "saved money".Note that in the quantitative evaluation of this paper, such cases where the keyword was paraphrased are classified as "the keyword was not included in the text".There are also cases where a given keyword is included in the text more than once.

Figure 1 :
Figure 1: An illustration of our method.The model is provided with control tokens: keywords in the target text, positions of each keyword, and the target text length to control text generation.
the dealership to look at a car.I was very nervous about buying a car.I asked the salesman if I could try out a certain car.The salesman told me that I could try out a blue car.I drove the blue car home and loved it so much, I bought it.dealership (30%) I went to the dealership to buy a new car.I was very nervous about the car and the price.I went to the dealership and looked at the price.I decided to buy the car and I was very happy with it.I am glad I went to the dealership because it was a great deal.dealership (80%) I went to the dealership to look at a car.I was very nervous because I didn't know what I wanted.I asked the salesman if I could pick out a car.The salesman told me that I could get a new car if I paid $40,000.I drove to the dealership and bought a new car.drive to work (50%) saved enough (70%) I was driving to work one day when I saw a car in the road.I pulled over and asked if I could drive to work.The driver told me that he had saved enough money to buy a new car.I drove to work and paid him back.I drove to work and paid him back and he was very happy.drive to work (20%) saved enough (0%) I saved enough money to buy a new car.I went to the car dealership to test drive my new car.I drove the car for a few hours before I left.When I got home, I realized I had forgotten my wallet.I had to drive to work to get my wallet back, but I was happy.

Table 1 :
Evaluation of the control of keywords and their positions.We evaluate the accuracy of generating text Including all of the target keywords and the accuracy of generating text in which all of the target keywords are placed in each target Position.

Table 3 :
Examples of generated summaries from the CNN/DM dataset.

Table 4 :
Summarization evaluation by ROUGE score.To reduce the effect on the ROUGE score due to giving target keywords, we exclude target keywords from both the target and generated summaries.

Table 5 :
Dataset statistics: the number of training data, the number of development data, the number of test data, the number of words in the source document and its standard deviation, and the number of words in the target text and its standard deviation.

Table 6 :
Evaluation of the diversity of generated texts using the Self-BLEU metric.
that does not use control tokens (w/o Control in Table

Table 7 :
Comparison between specifying oracle keyword positions and random keyword positions.The keywords given to the model are the single oracle keyword extracted from the target text, and are same keywords in the both setting.

Table 8 :
Examples of texts generated from the CNN/DM dataset.This table is the complete version of Table 3 with source document, gold summary, and additional examples.