BioGen: Generating Biography Summary under Table Guidance on Wikipedia

Capturing the salient information from an input article has been a long-standing challenge for summarization. On Wikipedia, most of the wiki pages about people contain a factual table that lists the basic properties of the people. Illuminatingly, a factual table can be regarded as a natural summary of the key information in the corresponding article. Thus, in this paper we propose the task of table-guided abstractive biography summarization, which utilizes factual tables to capture important information and then generate a sum-mary of a biography. We ﬁrst introduce the TaGS (Ta ble-Guided Summarization) dataset 1 , the ﬁrst large-scale biography summarization dataset with tables. Next, we report some statistics about this dataset to validate the quality of the dataset. We also benchmark several commonly used summarization methods on TaGS and hope this will inspire more exciting methods.


Introduction
Text summarization generates a short text version of a long passage which retains the most important information. Recently, two kinds of approaches have been proposed for automatic text summarization. One is extractive summarization (Nallapati et al., 2017;, which directly selects salient sentences from the passage to create a summary. The other is abstractive summarization (See et al., 2017;Hsu et al., 2018a), which aims to concisely paraphrase the input article. In both methods, the summary should always focus on important information, though a document may include trivial facts. * Equal contribution. Ordering is decided by a coin flip. † Corresponding Author: Dongyan Zhao 1 https://github.com/gsh199449/ table-summ To focus on the main information when generating summaries, some researchers propose to incorporate manifold information to improve the performance. Narayan et al. (2017) proposed to incorporate the figures and Gao et al. (2019b) investigated the using of reader comments for more effective summarization. As another type of side information, factual tables provide a natural summary of the biography document. On Wikipedia, in each wiki page about people, there is a factual table (infobox) on the right side of the page summarizing the main properties. Clearly, infobox is helpful for capturing the salient information during summarizing the biography. However, no existing work takes advantage of tables, though are widely available in the biography on Wikipedia.
In this paper, we propose Table-Guided Summarization (TaGS) dataset, the first large-scale biography summarization dataset with tables. And we report some statistics and three important characteristics of this dataset to verify its quality. The first one is it has the weak lead bias that makes it suitable for training both abstractive and extractive summarization methods. The second one is it has strong abstractness that is helpful for generating a more condensed summary. The most important characteristic is that the summary of the biography is guided by a table which contains the most salient facts described in the biography.
To verify the quality of this dataset, we employ some commonly used state-of-the-art summarization methods to conduct experiments on our proposed dataset. From these experimental results, we can see that the methods which simply incorporate the table information outperform the methods which do not use the table information. That demonstrates the effectiveness of incorporating table guidance when generating summaries of documents which have a factual table in it.
Our contributions are summarized as follows: • To the best of our knowledge, we are the first to use factual tables to guide the summarization procedure so as to generate better summaries.
• We release a large-scale abstractive biography summarization dataset with tables. Experiments conducted on this dataset demonstrate the effectiveness of incorporating table information in generating summaries.

Text Summarization
Text summarization is an important task which can be classified into extractive and abstractive approaches. Extractive summarization (Narayan et al., 2018b;Chen et al., 2018;Jadhav and Rajan, 2018) tends to generate a summary by integrating the most salient sentences in the document. Cheng and Lapata (2016) first propose using recurrent neural network (RNN) to extract salient sentences. After that researchers explore many the neural based method (Nallapati et al., 2017;Chen et al., 2018;Zhang et al., 2018;, and achieve the state-of-the-art performance (Liu and Lapata, 2019) on the benchmark dataset CNN/DailyMail. In the mean time, the Nallapati et al. (2016) firstly apply this text generation method to the abstractive summarization task and Gehrmann et al. (2018) achieve the state-of-the-art performance by using a data-efficient content selector.

Summarization with Side Information
Traditional text summarization methods only use the document as input. However, the gist of the document may lie in side information, such as the title, image captions, or comments which are often available for news-wire articles. As such, various studies Hu et al., 2008) have tried to use such information for more efficient and accurate summarization. However, to the best of our knowledge, no existing works consider the use of tables to guide biography summarization.

TaGS Dataset
Our dataset, named Table-Guide Summarization (TaGS), consists of over 500,000 documentsummary pairs, along with their corresponding factual tables collected from Wikipedia. Concretely, following Chen et al. (2019), we use the leading paragraphs before the content outline as the summary, and following paragraphs as the document. The infobox in the right part of the webpage is extracted as the guided table.
Some key statistics of the factual table are described below. 7.31% of words from a document and 29.41% of words from a summary are included in a factual table. The average number of fields in a table is 12.89, and there are 46.83 words in a table in average. We show some detailed statistics of document-summary pairs in TaGS and compare them with other popular text summarization datasets in Table 1. We next discuss some abstractive characteristics of TaGS compared to existing summarization datasets.
Weak Lead Bias. Lead bias means that directly using the leading sentences of a document can produce a good performance in terms of the summarization evaluation metric ROUGE (Lin, 2004). This is a common problem in text summarization datasets, which mostly occurs in newsbased documents. Figure 1 plots the density histograms for the relative locations of words from the ground truth summary in the input document. In the CNN/DailyMail and Newsroom datasets, the words are highly concentrated at the leading parts of the input document. In contrast, our TaGS dataset shows more uniform distributions across words in the document. This characteristic can be also found from the LEAD score shown in Table 2, where LEAD is a baseline method that selects the first few sentences in the input document as the summary. A high LEAD score implicitly indicates a strong lead bias. From Table 2, we find that TaGS has a much lower LEAD score than the CNN/DailyMail and Newsroom datasets, and  thus prevents the model from directly learning the salient information by locational bias. Strong Abstractness. Table 2 reports the percentage of novel n-grams in the ground truth summary that do not appear in the input document. The result shows that our dataset comprises of 43.38% novel unigrams in the summary, 122.46% higher than the commonly-used benchmark dataset CNN/DailyMail. This indicates that summaries in TaGS are more abstractive. Besides, other two metrics, density and coverage, proposed by Grusky et al. (2018), are commonly used when evaluating the summarization dataset (Kim et al., 2019;Grusky et al., 2018). We plot the distributions of these two metrics in Figure 2, where small density and coverage reflects the summary has strong abstractness. The result shows that TaGS is similar to XSUM and Reddit in terms of density and coverage, and these datasets all have strong abstractness.
PG/LEAD in Table 2 is the ROUGE-L ratios of PG to LEAD, which quantifies the difficulty for extractive methods and the suitability for abstractive methods. CNN/DailyMail and Newsroom achieve low PG/LEAD scores, demonstrating that these datasets are more suitable for extractive based model. On the contrary, XSUM and TaGS have high PG/LEAD, showing that TaGS is potentially an excellent benchmark for evaluation of abstractive summarization systems.

Comparison Methods
To evaluate the effectiveness of incorporating table, we conduct experiments using the following baselines: (1) LEAD3: selects the first three sentences of a document as the summary.
(2) S2S: is the traditional sequence-to-sequence framework in (Sutskever et al., 2014) which has been used in many text generation tasks (Gao et al., 2019c(Gao et al., , 2021Chan et al., , 2020Chan et al., , 2019b.  proposed in Vaswani et al. (2017). (6) CopyTransformer: is a state-of-the-art generative summarization model (Gehrmann et al., 2018), which combines the Transformer with copy mechanism. (7) TabWords: just concatenates all the words in table as the summary. Additionally, we select two best baselines, Unified and CopyTransformer, concatenating the original input document with tables as input, denoted as (9) Unified+T and (10) CopyTrans-former+T, to determine whether the improvement of TGSG simply arises from the table information.

Evaluation Metrics
For evaluation metrics, we adopt the ROUGE scores (Lin, 2004) which is widely applied for summarization evaluation (Sun et al., 2018;Chen et al., 2018;Gao et al., 2019a). The ROUGE metrics compare the generated summary with the reference summary by computing overlapping lexical units, and include ROUGE-1, ROUGE-2 and ROUGE-L.

Experimental Results
We first examine the performance of these baselines, as shown in Table 3. Firstly, among models without table information, Unified achieves the highest performance.
PG and CopyTransformer achieve the second best performance. Secondly, tables are indeed helpful for the summarization process. For models with additional table information, the ROUGE-1 score of Unified+T and CopyTransformer+T improves by 4.22 and 14.3, respectively. This observation demonstrates that factual tables can help the summarization model to capture the main idea of the table by emphasizing the key facts in the document. However, the performance of TabWords is much lower than Unified+T and CopyTransformer+T, which demonstrates

Conclusion
In this paper, we proposed to use factual tables to guide biography summarization. To demonstrate the effectiveness of incorporating table information in generating biography summaries, we developed the first large-scale abstractive biography summarization dataset with tables. We employ several state-of-the-art summarization methods, and adapt these methods to table guided biography summarization task. These methods outperformed other summarization methods in terms of ROUGE which only use the document as input and ignore the table guidance.
tion of China (NSFC No. 61876196 and NSFC No. 61672058). Rui Yan is partially supported as a Young Fellow of Beijing Institute of Artificial Intelligence (BAAI).

Ethical Impact
In this paper, we propose a table guided biography summarization dataset on Wikipedia. In the real-world application which provided automatic biography summarization service, we will employ human editors to double-check the generated summary to ensure the correctness of content and grammar before publish the summary.