Sentence Ordering with a Coherence Verifier

,


Introduction
Coherence is essential for effective communication.The correct order of sentences is a necessary attribute of text coherence.Sentence ordering aims to organize a set of possibly unordered sentences into a coherent text.It is closely associated with coherence modeling.On one hand, it has been used as an objective for learning coherence models.On the other hand, it can be viewed as a follow-up module of coherence evaluation, e.g., for improving texts with low coherence scores.So sentence ordering has highly practical value in downstream tasks for evaluating and improving the quality of human writing (Amorim et al., 2018;Mim et al., 2019) or machine-generated content (Reiter and Dale, 1997;Fan et al., 2019;Hu et al., 2020;Guan et al., 2021).
Recent sentence ordering studies can be classified into 2 categories: pair-wise ranking-based and sequence generation-based methods.
Pair-wise ranking-based methods first model the relative order of each sentence pair and then integrate all the predicted relative orders with some ranking methods to get the final order (Chen et al., 2016;Prabhumoye et al., 2020;Ghosal et al., 2021;Zhu et al., 2021).For example, B-TSort (Prabhumoye et al., 2020) uses BERT for pair-wise classification, builds a constraint graph to integrate pair-wise predictions, and adopts the topological sorting algorithm for sentence ranking.
Sequence generation-based methods are mainly based on the pointer networks (Vinyals et al., 2015).An encoder encodes all unordered sentences in various ways to capture the paragraph-level contextual information (Cui et al., 2018;Yin et al., 2019;Wang and Wan, 2019;Yin et al., 2021;Lai et al., 2021), then a decoder iteratively selects the next one from the set of unordered sentences conditioned on the states of the encoder and the already ordered sentence sequence.
However, both categories of methods have a shortcoming in that the coherence of ordered sentences is not directly optimized but is approximated by optimizing auxiliary tasks, e.g., pair-wise ordering and ranking algorithms, or optimizing a series of conditional decisions, e.g., iterative sentence selection by a pointer network.These sub-optimal objects have a misalignment with the purpose of finding an order with maximal global coherence.
In this paper, we propose a simple sentence ordering method by introducing a Coherence Verifier (COVER).It can be plugged into the ranking-based and sequence generation-based models.Figure 1 shows an example of how COVER works together with a sequence generation baseline.COVER only intervenes in the generation process.At each inference step, we let the baseline provide top candidates as the next sentence (e.g., s 4 and s 3 ) and use COVER to verify the coherence of the sentence sequence candidates (e.g., s 1 , s 2 , s 4 and s 1 , s 2 , s 3 ) and re-rank the candidates for future generations.As a result, our method combines local conditional evidence and global coherence.
COVER is trained to measure coherence independently of the sentence ordering task.This is reasonable and important since the input of a coherence model is an ordered sentence sequence rather than a set of unordered sentences, and the model can be pre-trained with multi-domain datasets.We propose a novel coherence model, with a new graph formulation to model sentence pair orders, sequence order, and paragraph-to-sentence relations, and a novel gradual permutation-based data construction strategy for effective contrastive pretraining from pairs of sentence orders with different coherence degrees.
We evaluate the effectiveness of COVER by letting it work with a topological sorting-based baseline B-TSort (Prabhumoye et al., 2020) and a pointer network-based sequence generation baseline BERSON (Cui et al., 2020).Experimental results on four benchmarks demonstrate that our method improves both baselines and especially, obtains a large gain for the topological sorting-based baseline.It also outperforms other recent methods.
We conduct a series of in-depth analyses showing that our method can correct a large ratio of sentence pair classification errors made by B-TSort and improve ordering accuracy at the early decoding stage for BERSON, which alleviates the gap between training and inference, and reduces error propagation.These effects come from the key designs of our coherence model.Moreover, the COVER pre-trained with larger cross-domain datasets obtains better performance than the mod-els trained with domain-specific datasets.The results verify the importance of pre-training the independent coherence model and also indicate that sentence ordering and coherence modeling can cooperate and interact well.

Coherence Modeling
The main coherence modeling methods can be classified into the following categories.Entity grid-based Methods measure local coherence by tracking the transitions of the grammatical roles of entities between sentences (Barzilay and Lapata, 2008;Lin et al., 2011).Tien Nguyen and Joty (2017) proposed the first neural entity model based on convolutional neural networks (CNNs).Jeon and Strube (2022) proposed to compute coherence by constraining the input to noun phrases and proper names since they explicitly lead to the notion of focus in sentences.Graph-based Methods are another framework for modeling local coherence.Guinaudeau and Strube (2013) described relations between sentences and entities with graphs and measured local coherence by computing the average out-degree of graphs.Mesgar et al. (2021) adopted graph convolutional networks (GCNs) for encoding entity graphs to model local coherence.Data-driven Methods focus on learning domainindependent neural models of discourse coherence (Li and Jurafsky, 2017;Farag and Yannakoudakis, 2019).The key is to define proper learning objects, including discriminative models to distinguish coherent from incoherent discourse, generative models to produce coherent texts (Li and Jurafsky, 2017), and multi-task learning with auxiliary tasks (Farag and Yannakoudakis, 2019).

Sentence Ordering
Sentence ordering task takes possibly out-of-order sentences s = s 1 , s 2 , ..., s n as input, and aims to find the best order o * = o 1 , o 2 , ..., o n to make the sentence sequence s o 1 , s o 2 , ..., s on with maximal global coherence.
Recent sentence ordering methods are mainly based on neural networks and can be classified into the following two categories.

Pair-wise Ranking based Methods
The main procedure of this category of methods is: Step 1: Learn a pair-wise classifier to determine the relative order of each sentence pair.The classifier can be trained based on BERT (Prabhumoye et al., 2020;Zhu et al., 2021) or GCNs (Ghosal et al., 2021).
Step 2: Integrate the relative orders to build relations between sentences.A common way is to build a constraint graph based on the relative orders.
Step 3: Rank the sentences based on the graph with a ranking algorithm like topological sorting (Prabhumoye et al., 2020;Ghosal et al., 2021), or using a neural network to score sentences (Zhu et al., 2021), or modeling it as the asymmetric traveling salesman problem (Keswani and Jhamtani, 2021).

Sequence Generation based Methods
Sequence generation-based models mainly depend on the pointer networks (Vinyals et al., 2015).The encoder maps a set of sentences into a fixed-length vector representation in various ways (Cui et al., 2018;Yin et al., 2019Yin et al., , 2021;;Lai et al., 2021;Basu Roy Chowdhury et al., 2021;Cui et al., 2020).The decoder iteratively generates the sentence sequence based on the attention scores over input sentences.
Formally, the decoders focus on modeling an autoregressive factorization of the joint coherence probability of a predicted order ô, where ô<i is the sequence of already ordered sentences, U i is the set of unselected sentences at step i, ôi is the i-th sentence in ô and a U i (s i |ô <i , s) is the attention score for a candidate sentence s i ∈ U i .
Beam search can be used for enlarging the search space and ranking partially generated hypotheses during decoding.But the ranking in beam search is still based on conditional evidence (Equation 1).
3 The Proposed Framework

The Motivation
The existing sentence ordering methods make the best decisions based on conditional evidence or local constraints, but do not directly optimize global coherence.This is natural because the model cannot see the complete global information before generating the final ordering.
When people do the same task, we also start from incomplete information.However, once we have a partial or final ordering, we often revisit the already-ordered sentences to verify whether the current text is coherent or needs to be revised.The verification step is intuitive and important since we can see more complete information.
Motivated by the above observations, we propose a simple sentence ordering framework by incorporating an independent coherence verifier.We call it COVER.COVER reads an ordered sentence sequence and gives a coherence score.We expect COVER can verify the predicted results of a baseline model and rerank the candidates to get a more coherent one.
We will introduce the details of COVER in §4.In this section, we focus on demonstrating that COVER can be flexibly incorporated with sequence generation-based ( §3.2) and topological sortingbased ( §3.3) models through beam search.

COVER for Sequence Generation-based Models
As Figure 1 shows, COVER can be easily incorporated into a pointer network-based baseline model.
It only intervenes in the decoding process.At each decoding step, we compute the score of a candidate sentence s i as where a U i (s i |ô <i , s) is the attention score.We put s i at the end of ô<i and COVER(ô <i , s i ) returns a coherence score for the resulted sentence sequence.COVER can be incorporated through beam search and g(s i ) in Equation 2becomes g(ô <i , s i ).
A beam B = {ô <i } stores the top k preceding orders where k is the beam size and each candidate s i ∈ U i is combined with the items in B. We score each combination (ô <i , s i ) based on g(ô <i , s i ) and store the top k combinations in B.

COVER for Pair-wise Ranking-based Methods
For a pair-wise model, COVER does not affect the pair-wise classifier and only affects the ranking part as long as the model can provide multiple ordering candidates.In this paper, we focus on improving topological sorting-based methods.
The topological sorting algorithm reads a constraint graph G = (V, E), where an edge from v i ∈ V to v j ∈ V indicates the sentence s i is predicted to be preceding s j in the document.At each time, the node without any incoming edges would be selected.Then the algorithm removes this node and its associated edges from G and repeats the above process until all nodes are processed.
We can see that the ordering process is also a generation process.As a result, we slightly modify the generation process and describe it in Algorithm 1.

Algorithm 1: COVER for Topological Sorting through Beam Search
Input: Directed graph G = (V, E), beam size k, steps t to look ahead, start returns the start node in a graph based on the topological sorting algorithm, top_k returns the top k ranked items in a list We introduce a beam B to store the top k partial orderings (line 1).A key operation is letting the topological sorting algorithm look ahead t steps to have more and longer partial ordering candidates and store them in a temporary list b (line 3 to line 11).COVER scores the partial ordering candidates In this way, COVER plays a role in the whole generation process and corrects the errors made by the pair-wise classifier in time by measuring coherence, which is ignored by topological sorting.

COVER: The Coherence Model
We propose a new graph-based coherence model as COVER.Specially, we propose a new graph formulation and model it with GNNs for coherence evaluation ( §4.1).We also propose a new data construction strategy for contrastive pre-training of the coherence model ( §4.2).

Graph Formulation and Modeling
Given ordered sentences in a paragraph d, we construct a graph G d = (V, E, R).V is a set of nodes, E is a set of directed edges connecting nodes and R is the set of edge types.Figure 2 shows an example of the graph for a paragraph with 5 sentences.The graph is a tournament digraph, in which every pair of distinct nodes is connected by a directed edge.
We consider two types of nodes V = {v d } ∪ V s : • Sentence nodes V s : Each sentence s i with an ordered index i has a node v i ∈ V s .
• Paragraph node v d : The paragraph has a node to represent the general topic of the paragraph.
We also consider three types of directed edges and the edge types are R = {r d , r s , r k }: • Paragraph-to-sentence edges: We build a directed labeled edge (v d , r d , v i ) from the paragraph node (para-node) to each sentence node, where r d indicates the edge type.
• Sequential edges: We build a directed labeled edge (v i , r s , v i+1 ) with a type r s between sentence s i and s i+1 .
• Skip edges: We build a directed labeled edge (v i , r k , v j ) with a type r k between sentence s i and s j , if j > i + 1.
Sequential edges are the most natural choice for describing local coherence (Mesgar et al., 2021).We further use densely connected skip edges to describe long-distance ordering information so that every sentence s j can directly receive information from all preceding sentences in the same paragraph rather than only receiving summarized information from s j−1 .The formulation is rarely explored in previous coherence modeling work.Node Representations We map the nodes to dense vectors.Specifically, we use DeBERTa (He et al., 2021) to get the representation of each sentence node.Each sentence is fed to DeBERTa independently and the hidden state of the [CLS] token is used as the node representation.For the paragraph node, we let DeBERTa read the entire paragraph to get the representation of the para-node.So the positional embeddings naturally encode the ordering information.Graph Modeling Following previous work (Mesgar et al., 2021;Ghosal et al., 2021), we use Relational Graph Convolutional Networks (RGCN) (Schlichtkrull et al., 2018) to further encode the relations between nodes, which is a natural choice for the modeling of edges between nodes.
The RGCN model can accumulate relational evidence from the neighborhood around a given node v i in multiple inference steps, i.e., h where h (l) i represents the hidden state of node v i in the l-th layer of the neural network.We use the representation of node v i from DeBERTa as h (0) i .r ∈ R is one of the edge types and N r i represents the set of nodes connected to v i through edge type r.W r is the parameter matrix for r and W 0 is the parameter matrix for the self-connection edge, which is an extra type in addition to R. σ(•) is set as ReLU(•).RGCN stacks L layers and we set L = 2, the same as (Ghosal et al., 2021).Coherence Evaluation After getting the final representations of all nodes, we get the representation of the graph G via h G = v∈V h v and map it to a coherence score Coh(G), i.e.,  where FFN is a single-layer feed-forward neural network.

Model Training
Training Objective We train our model based on a pair-wise ranking manner.Given a text d + with a higher coherence degree than a text d − , we use the following loss function for updating model parameters, where G d + and G d − are corresponding graphs for d + and d − , and τ = 0.1 is the margin.
Training Instance Construction The model can be trained using documents with manually annotated coherence degrees.However, the data scale is very limited.Another common way is distinguishing a coherent document from its permutations, where a coherent document and one of its random sentence permutations form a training instance.We call this way random permutation.
We propose a gradual permutation strategy by gradually corrupting a coherence document through pair-wise sentence permutation.Figure 3 illustrates an example of gradual permutation.A pair-wise permutation operation is to randomly select a pair of sentences that are not selected before in the current order and exchange them to get a new order.We assume the new order is less coherent than the previous one.By repeating this process, we can get a sequence of order samples o 1 , o 2 , ... with descending coherence degrees.Finally, we sample pairs of orders in the final sequence to form pair-wise training instances according to their relative coherence degrees.For one document, gradual permutation can be done multiple times.
Compared with random permutation, gradual permutation pays more attention to evaluating relative coherence between imperfect orders with different coherence degrees, instead of only distinguishing a perfect order from imperfect ones.

Pre-Training
The training of the coherence model can be independent of the sentence ordering task.
As a result, COVER can be pre-trained with domainindependent resources and be maintained as a verifier for sentence ordering in specific domains.

Experimental Settings
Datasets We conduct experiments on four widely used benchmarks.NIPS and AAN contains abstracts from NIPS and ACL anthology network papers (Logeswaran et al., 2018).SIND is originally used for visual storytelling (Huang et al., 2016), where natural language descriptions are provided for five images of each story.ROCStory is a dataset of short stories, each of which has five sentences (Mostafazadeh et al., 2016)  Evaluation Metrics We adopt the following three commonly used metrics for evaluation.Perfect Match Ratio (PMR): PMR measures the percentage of documents for which the entire sequence is correctly predicted (Chen et al., 2016).
Kendall's τ : It measures the difference between the predicted order and the gold order of sentences based on the number of inversions (Lapata, 2003).

Accuracy (ACC):
It measures the percentage of sentences, whose absolute positions are correctly predicted (Logeswaran et al., 2018).Baselines and Settings We use B-TSort (Prabhumoye et al., 2020) * and BERSON (Cui et al.,  2020)  † as our main baselines.We choose them because they are recent representative pair-wise ranking-based and pointer network-based methods, with top performance and almost reproducible results with publicly released codes.We use optimized parameters provided by the original papers and re-run the source codes in our machine.We run these baselines for three times with different random seeds and use the baseline models with the best performance in our experiments.
Our method lets COVER work together with B-TSort and BRESON, utilizing and adjusting their predictions with beam search.The same as the setting of BERSON, we set the beam size as 16 for both baselines.The looking ahead steps t in Algorithm 1 is 2. The hyper-parameter α in Equation 2 is 0.1, which is chosen from {0.01, 0.1, 0.5, 1} based on the validation performance.
We use the AdamW optimizer for training the coherence model.The learning rate for the parameters of DeBERTa, which is used for getting node representations, is 1e-6 and the learning rate for the parameters of the RGCN model is 1e-4.
We pre-train COVER using the combination of the training sets of the four benchmarks with an A100 GPU for 40 hours and train a domain-specific COVER dom for each dataset using the corresponding training set.For one document, we sample two sentence permutations as negative instances.

General Results on Sentence Ordering
Table 1 shows the performance of our method, two main baselines, and other recent methods.
First of all, we can see that both COVER and COVER dom improve the two baselines on all benchmarks.The pre-trained COVER outperforms the domain-specific COVER dom in most cases, indicating pre-training the coherence model is feasible and useful.We can maintain a single coherence model instead of domain-specific ones and even have a boost in overall performance.
Based on the beam search algorithm for topological sorting, COVER obtain 11.1%, 9.6%, and 18.1% average absolute improvements in Acc, Kendall's τ , and PMR compared with B-TSort.
Based on adding coherence verification in the beam search-based decoding, COVER achieves 2.0%, 1.3%, and 4.2% average absolute improvements in Acc, Kendall's τ , and PMR compared with BERSON.The improvements are smaller but still significant.Especially, our method has significant performance improvement in PMR.

Effect of COVER for B-TSort
Our method gets large improvements for B-TSort.We hope to analyze the improvements more deeply and conduct investigations on the NIPS dataset.
We start by analyzing the predictions made by B-TSort's pair-wise classifier.Specifically, we group sentence pairs according to the distance between two sentences.We investigate the error ratio for different distance d, where error ratio = #incorrect pair-wise prediction #all pairs within distance d , and analyze the confidence of the pair-wise classifier, using its prediction probability as a confidence measure.
Figure 4 illustrates the error ratio and averaged prediction confidence for different values of d.B-TSort's classifier is more confident and accurate for determining the relative order of sentence pairs with larger distances but is less confident and struggles in handling the relative order of nearby sentences.This is reasonable since nearby sentences share similar topics in content so it is hard to determine the relative order without a larger context.The topological sorting algorithm does not consider content information and cannot deal with lowconfidence predictions as well.
Figure 4 also shows the error ratio of B-TSort plus COVER.Our method reduces 21% to 27% errors for sentence pairs with distance d ≤ 4 and reduces more than 50% errors for long-distance sentence pairs.This indicates that based on Algorithm 1, COVER overcomes the limitations of the original topological sorting algorithm and gradually improves the predictions.

Effect of COVER for BERSON
We infer that one of the reasons that COVER improves BERSON is alleviating the gap between the training and inference.We conduct a controlled experiment to verify this assumption.
During inference, we experiment with different input orders to the decoder of BERSON: 1) perfect: the input order is the right-shift of the gold order, which is the same as the training phase; 2) predicted: the input order is according to the predicted order, which is the normal way for inference.In either case, we evaluate the outputs of the decoder.Table 3 shows the average performance over four datasets.BERSON with perfect input orders sets an upper bound.In contrast, in the normal way for inference, BERSON's performance drops a lot because there are likely errors during order generation and the imperfect preceding order would affect the future generation as well.With the help of COVER, BERSON can get a performance closer to that with perfect input order.
A natural assumption about the effect is that COVER improves the predictions in the early decoding stage so that future generation is based on a closer-to-perfect preceding order.

Ablation Study of COVER
We further investigate the effectiveness of key designs of COVER, mainly from two aspects: the graph formulation and the training strategy.Graph Formulation We focus on analyzing the importance of the skip edges and the paragraph node, which mainly encode ordering information.
Table 4 shows the results.For B-TSort, removing skip edges leads to a small performance decrease, while removing the paragraph node leads to a large decrease.The reason may be that the topological sorting algorithm depends on the predicted pair-wise relative orders but does not consider any content information.So encoding the content of a paragraph is more important.For BERSON, the paragraph node and the skip edges are both important.The skip edges explicitly connect preceding sentences and the candidate sentence, which may help deal with imperfect partial orders.
A state-of-the-art coherence model Mesgar et al. (2021)  ble 5 shows the average performance over four datasets.Gradual permutation consistently gets better performance than random permutation.
We further analyze the error ratio for sentences at different positions in the gold orders on the NIPS dataset.Table 6 shows that using either strategy, our method can obviously reduce the error ratio for almost all positions.Random permutation outperforms BERSON at all positions, while gradual permutation has the lowest error ratio for sentences at the front and middle of the documents.This is because, with the training instances constructed by gradual permutation, the model can better compare relative coherence between imperfect orders so it can correct more errors in preceding sentences, making the decoding more robust.But gradual permutation has a slightly worse error ratio at the end of the documents.The reason may be that the training instances containing perfect orders are less, affecting the judgment for sentences at the end.In the future, we will investigate better sampling strategies that can keep a trade-off between random permutation and gradual permutation.
Connecting the above observations, COVER significantly improves the accuracy at the front of documents and can gradually improve the partial orderings.These factors can reasonably explain the effects of COVER for B-TSort and BERSON.

Predicting the First and Last Sentences
The first and last sentences are important to documents.Following previous studies, we report the performance of our model against two baselines in correctly predicting these two sentences on four benchmarks.
As displayed in Table 7, our method obtains significant improvements in predicting the first sentences across four benchmarks for both B-TSort and BERSON.However, it performs better than B-TSort but slightly worse than BERSON in predicting the last sentences.This observation is consistent with the analysis in §5.5.

Performance on Short and Long Documents
We conduct experiments on NIPS and AAN datasets to analyze the effects of COVER for short and long texts.The documents in the test set are divided into short ones (with less than 8 sentences) and long ones (with 8 or more sentences).There are 298 short and 79 long documents in NIPS, and 2358 short and 268 long documents in AAN.
Table 8 shows the results.For two baselines, our method has great improvements in both short and long documents.

Performance on Coherence Rating
We also evaluate our model for the summary coherence rating (SCR) task.We use the dataset proposed by Barzilay and Lapata (2008), which contains English summaries produced by human experts and an extractive summarization system.Each instance in the dataset is a pair of two summaries with different ratings of the same text.

Conclusion
This paper has presented a novel sentence ordering method by incorporating a coherence verifier (COVER).We show that COVER works well with pair-wise ranking-based and sequence generationbased baselines.Our framework combines local evidence from the baselines and larger context coherence from COVER and can gradually improve partial orderings.The coherence verifier is independent of the sentence ordering task but can be optimized for sentence ordering (e.g., via gradual permutation), and can be pre-trained with multidomain datasets, obtaining superior performance compared with domain-specific models.So it is effective and easy to maintain and transfer.
Sentence ordering is often used as a training task for coherence modeling.This paper, however, suggests that coherence models can also support sentence ordering methods to correct incoherent texts.Coherence models are able to identify sentences that are not well-connected.Sentence ordering models can then be used to reorder these sentences to improves the coherence of the text with the assistance of the coherence models.

Limitations
While the proposed method performs well on four benchmarks, we discuss some of its limitations.
On one hand, as discussed in §5.5, our method is not accurate enough to predict sentences at the end of the documents.There may be some better strategies to construct training samples so that the model can better take into account each part of the documents and make more accurate predictions.
On the other hand, our model is not pre-trained with more diverse domains and larger scale data.Our datasets are limited to two types, i.e., paper abstracts and short stories, both of which have comparatively obvious order characteristics.In addition, we do not use some larger scale datasets, such as NSF abstracts and arXiv abstracts, because of computation and time constraints.With more diverse and larger data, the performance of our model should be further improved.

A Detailed Experimental Results
Table 10 lists the detailed error ratio data in Figure 4 for reference.We also report the detailed results of four benchmarks about the controlled experiment for analyzing the gap between training and inference in §5.4 (Table 11) , ablation study of the graph formulation in §5.5 (Table 12) and the performance with random and gradual strategies in §5.5 (Table 13).

Figure 1 :
Figure 1: An example of how COVER works together with a sequence generation model.The model's encoder can encode a set of sentences in various ways and its decoder iteratively generates sentence order based on (accumulated) conditional scores.COVER is used as a coherence verifier to measure the coherence of a candidate order.It can be pre-trained and flexibly plugged into the decoding process through beam search.

Figure 2 :
Figure 2: A tournament digraph to encode the order and topic of sentences in a paragraph.Each sentence has a node and the paragraph as a whole has a node.

Figure 3 :
Figure 3: An example for constructing training instances with the gradual permutation strategy.

Figure 4 :
Figure 4: The error ratio of predicted pair-wise relative orders w/ and w/o COVER for B-TSort and the average prediction confidence of B-TSort's pair-wise classifier.

Table 1 :
The general comparison results against two baselines and other recent methods on four datasets.

Table 2 :
Statistics of datasets used in our experiments.

Table 3 :
Average performance with perfect and predicted order as the input of BERSON's decoder.

Table 4 :
Ablation study of the graph formulation.

Table 5 :
is also used as the verifier.It improves B-TSort and BERSON, but has a certain gap with COVER, indicating the advantage of coherence verification and the designs of COVER.Training Strategies We compare the random permutation and gradual permutation strategies.Ta-Average performance with two strategies.

Table 6 :
Error ratio for sentences at different positions with random and gradual permutation strategies.

Table 7 :
Accuracy of predicting the first and last sentences on four benchmarks.

Table 8 :
Table 9 shows that our model performs very close to the best performance Results on short and long texts in the NIPS and AAN datasets.

Table 9 :
Results for summary coherence rating in DUC2003 dataset.

Table 10 :
The detailed error ratio of predicted pair-wise relative orders w/ and w/o COVER for B-TSort

Table 13 :
The detailed results for Table 5 in §5.5.