CANarEx: Contextually Aware Narrative Extraction for Semantically Rich Text-as-data Applications

,


Introduction
Narratives are accounts of connected events, and are a basic mechanism of communication and of 1 Code: https://github.com/nandinisa/CANarExunderstanding our world (Piper et al., 2021)."Narrative modelling" is the discovery of underlying narratives (such as "politician is liar" or "inflation is high"), as well as the analysis of how their shape and prominence shifts over time (Zhang et al., 2019;Ash et al., 2021).Narrative modelling offers the potential to go beyond simple issue tagging or topic modelling to get at the richer question of how ideas are connected and framed, and when these relationships form, arise, and change.
In the realm of the social sciences, quantitative and causal analysis relies on structured, tabular data, with researchers recognising the potential for 'text as data' to greatly expand the realm of analysis into news, opinion, and political discourse (Gentzkow et al., 2019).In particular, if semantically related narratives can be reliably identified in a large corpus over time using narrative modelling, then established time-series techniques in the social sciences can be leveraged in a wide array of applications (Box-Steffensmeier et al., 2014).Recent work on narrative extraction such as Relatio (Ash et al., 2021) demonstrates the potential of such approaches to discourse analysis, when based on more traditional NLP techniques.The advances in Transformer-based language models such as BERT (Devlin et al., 2019) and GPT-3 (Brown et al., 2020) present the opportunity to vastly increase the power of such methods, incorporating a greater understanding of "context" into the model.Context is especially important to narratives since they are fundamentally composed of sequences of coupled language components.
We demonstrate and evaluate, CANarEx, a contextually-aware narrative modelling approach which employs state-of-the-art Transformer models to unlock the potential of the context-sensitivity of these models.We show the superiority of CANarEx's contextually-aware narrative modeling technique to (already powerful) existing methods in the literature.We discuss how these models can be used to conduct previously infeasible analysis of the shifts in narratives through an illustration in the social sciences by understanding social discourse over time.We employ generative text models tuned on narratives extracted from original news, opinion and Parliamentary speeches to develop a highly realistic narrative time-series recovery evaluation task.

Related Work
Automated extraction of narratives and other artifacts for text as data is an area of ongoing research (Piper et al., 2021), spanning from classical NLP techniques to the new transformer architectures.Given the typical multi-entity, discursive nature of most narratives, identifying entities and interactions between entities are common building blocks of narrative modelling approaches.These interactions can be framed as semantic triples extracted using a semantic parser and can take forms such as {Subject-Predicate-Object}, {Entity-Verb-Entity}, with the key idea being that the subject and object of the triple represent entities and the verb or predicate of triplet represents the interaction (or event) between them (Rospocher et al., 2016;Spiliopoulou et al., 2017;Ash et al., 2021).These E-V-Es provide a succinct summary of the underlying events, actors addressing "who does what to whom" (He et al., 2015) and thus provide a framework that enables further downstream tasks.In terms of the "minimal model of narrativity" (Piper et al., 2021), the triples can be mapped to the features 'agents' and 'events'.Zhang et al. (2019) propose a narrative modelling framework with a particular focus on identifying economic activities (debates, conflicts, compromise) around social processes such as industrial regeneration.Towards this end, their work prioritises identifying the points of view of each entity called attribution (beliefs, thoughts, speeches) in a given narrative alongside identifying the entities and events involved.Their pipeline includes manually annotated custom event labelling, pre-trained entity recognition(NER) (Dernoncourt et al., 2017) models and a semantic role labelling (SRL) ensemble comprising of Semafor (Das et al., 2014) and DeepSRL (He et al., 2017) for event and attribute extractions of the narratives.The results from the semantic frames (predicate with entities) is split into entities, extracted with the NER, and events through mapping to the custom event labels which has the event type attribution.Through this process, their framework is able to attribute the statements and intents within a narrative to specific actors.
The 'Relatio' framework takes a more generalized approach to narrative modelling, seeking to extract narratives from any text data, and without extensive custom labelling.'Relatio' (Ash et al., 2021) framework enables identifying the opinions, inflection points and changing trends from public discourse through narrative extraction from news.They use the {Entity-Verb-Entity} narrative structure, or the E-V-Es using SRL (Stanovsky et al., 2018) to identify the entities in the discourse.They have an additional step to convert the extracted high dimensional entities to lower dimensional entities using clustering of word vectors (Mikolov et al., 2013;Cer et al., 2018).The workflow is presented in Figure 1.The separation of the entities from their verbs whilst clustering results in loss of contextual information.
As seen from these examples, modelling the multi-entity, discursive nature of narratives typically involves identifying entities and interactions between these entities, with the rest of the pipeline reflecting the specific goals of the model.The implementations of these different tasks of the pipeline have evolved to reflect the new transformer architectures (Vaswani et al., 2017) and associated improvements.Wankmuller (Wankmüller, 2021) provided a review of the possibilities of transfer learning with transformers in the domain of social science studies, highlighting the higher prediction power and increased efficiencies of these pretrained language models (PLMs) compared to the historically common supervised learning models.The CANarEx framework builds on the work of Relatio by incorporating more features to support complex narratives, and by updating the existing components to leverage transformer based models.

CANarEx
Here we propose the 'Contextual narrative extraction' (CANarEx) framework that enables contextual narrative extraction.The framework is represented in Figure 3.

Contextual approach
CANarEx provides several improvements to the original Relatio framework.The entities of a narrative present themselves through multiple sentences and thus have prior references to itself and to other  et al., 2021) entities.These prior references need to be resolved in order to recover the context for the specific narrative.The problem of prior references is solved through the addition of co-reference resolution to the pipeline to translate all the subsequent mentions in a document to their respective named entities, pronouns and noun phrases.We use the SpanBERTbase model from Joshi et al. (Joshi et al., 2019b,a) with domain set to newswire ('nw') for this step.
Next, the "interactions" between these entities have to be recovered in order to formalize the actions being performed between entities.Relatio views this as a predicate-argument resolution problem, and leverages SRL to extract this structure, with the target outcome of identifying E-V-Es embedded in the document.For SRL, we use the AllenNLP framework (Gardner et al., 2017) specifically the BERT-based models introduced by Shi and Lin (2019).The SRL results in a frame file of predicate constructs with the numbered arguments (ARGx for example) representing entities interacting via predicate V (verb).This translates to how the entities are interacting, when and where.In CANarEx, the prior application of co-reference resolution ensures a more comprehensive E-V-E structure in this SRL step compared to the vanilla Relatio approach.

Micro-narratives
In the next step, we further split the extracted E-V-Es into micro-narratives.Micro-narratives are narratives nested within other narratives, and are referred to as narrative levels in the minimal model of narrativity (Piper et al., 2021).SRL can generate multiple frames per sentence, one for each connecting verb.These frames can contain nested or overlapping narratives.We identify and disambiguate these constituent micro-narratives by checking if the extracted E-V-E from a frame is a subset of another E-V-E from another frame of the same sentence.If yes, we split the superset narrative into its constituent micro-narratives, as demonstrated via an example in Figure 2.These generated micronarratives are cleaned using stopwords removal.

Narrative clustering
Next, we need to cluster the extracted artifacts to generate semantically similar groupings of narratives.Prior work in Relatio clustered the entities to achieve this goal.However, this fragmented approach ignores the connections between the entities and therefore loses the narrative aspect represented by the verb.We propose a modified approach that preserves contextual information -we use sentence embeddings using SBERT (Reimers and Gurevych, 2019;Song et al., 2020) of the generated micro-narratives (E-V-Es), and perform clustering on these embeddings.

Optional -filtering of micro-narratives
The use of co-reference resolution and micronarratives generation results in a very large number of E-V-Es relative to the original text corpus.A proportion of these might be repetitive, necessitating a subsequent step in reducing this collection into a viable subset without loss of information.To filter out such E-V-Es, we evaluate Document level clustering (TopN) and Textrank algorithms as two filtering approaches.
The TopN document clustering approach assumes that each document has a main theme and a sub-theme.Therefore, we cluster all the sentences within each document into two clusters, pick the largest of the clusters to be the main theme, and then pick top N sentences closest to the centroid of this main theme cluster of the document.
Another approach is using the Textrank algorithm which is an automated graph based summarisation technique (Mihalcea and Tarau, 2004).This approach involves representing text as a graph with nodes (sentences) and edges (similarity between the connecting sentences).The top sentences are determined using a form of paging algorithm (Page et al., 1999), i.e. the most connected sentences.The sentence similarity can be calculated using common substrings, cosine similarity with TF-IDFs or sentence embedding (Barrios et al., 2016;Kazemi et al., 2020).We use the version by Barrios et al. (2016) with a modified Okapi-BM25 (information retrieval ranking) function, and vary the ratio of summary in proportion to the co-referenced sentences within a document as 0.05, 0.1, 0.2 and 0.4.
Both these approaches yield the top sentences within the documents which are then used to extract micro-narratives.We perform a final clustering step on the filtered set of micro-narratives and choose the cluster with the largest silhouette score.

Data
We demonstrate the CANarEx framework by applying it on two major sources of data -Hansard2 and Dow Jones Factiva3 datasets.The goal was to understand narratives of disadvantages in Australia, focusing on the group, 'First Nations' (Aboriginal and Torres Strait Islander people).The choice of these 2 data sources reflects two perspectives of the discourse -the debates by policymakers in the parliament (Hansard) and the reporting and editorials on the same topics by the news media (Factiva).
For the Hansard dataset, we filtered for the 'First Nations' group in speeches based on the presence of keywords: "first peopl", "first nations", "traditional own", "indigen", 'aborigin", and "torres strait island".We have made this dataset available as supplementary material.
We had 10554 (461180 sentences) of articles from Factiva and 7781 documents (440169 sentences) from Hansard describing the 'First Nations' group.
The results of the CANarEx framework are presented per dataset in Table 1.Each row of the tables represents the progression through each step of the CANarEx framework.A vast majority of sentences in both datasets (69% Factiva and 79% Hansard) were changed by co-reference resolution, underlining the importance of retaining context and entities through this step.The application of SRL resulted in frames being generated in over 90% of these sentences.Using these SRL extracted frames, we were able to generate 389429 micro-narratives from the Factiva dataset and 404237 from Hansard (retaining only the frames with a minimum of two entities).These were further cleaned using stopwords removal and removal of the verbs {is, are}, reducing the micro-narratives count by about 25%.The optional filtering step was applied on these cleaned sets of micro-narratives to arrive at the result set.
The result of Textrank filtering on the topic of young family within the 'First nations' Hansard data is shown in Figure 4.The links between the E-V-Es are weighted by frequency of occurrence.This enables us to identify the dominant micronarratives per entity pair, such as the higher occurrence of e1:family -v:use -e2:child care compared to other micro-narratives.The representation is inspired by the transformer attention model mechanism (Vaswani et al., 2017).A Factiva example is provided in Appendix C.

Evaluation
We evaluate the performance of CANarEx by testing the recovery of the time-series narrative across the corpus.To do this, we rerun our pipeline on time-series clusters of synthetic data and compare the results with the output of Relatio, our baseline model.

GPT-3 fine-tuning
GPT-3 is fine-tuned to generate paraphrased sentences using similar sentence pairs as training data.
We use the original sentences extracted from the Factiva documents, and filter for sentences that remain similar post co-reference resolution.We do this to ensure that the samples generated from GPT-3 contain sentences with references to pronouns and entity mentions.We then perform SBERT's (Reimers and Gurevych, 2020) paraphrase mining on these filtered sentence embeddings.This results in semantically similar sentence pairs [x,y], the training data for GPT-3.We also set a threshold of 0.75 ≤ cosine score < 0.99 and having word length > 5 resulting in 4961 [x,y] sentence pairs.We fine-tune the Ada GPT-3 model over these sentence pairs over 4 epochs.This is presented in Figure 5 (a).
Once the GPT-3 model is fine-tuned, we use this to generate synthetic test sentences.

Synthetic test data
The test sentences contain both topic cluster sentences and noise sentences.For topic sentences, we sample sentences from 3 chosen topic clusters from the Factiva dataset.These 3 clusters represent the topics: climate change, Kevin Rudd and indigenous people, Figure 6 + F q 0 l p 5 g 5 h j 9 w P n 8 A h Z q N 5 w = = < / l a t e x i t > and noise sentences where chosen farthest from the cluster centroids.In case of noise, there could be a scenario of overlapping clusters, but since the points were retrieved from clusters not associated with the chosen 3 topic clusters, the impact is assumed to be minimal.We use these test sentences to generate synthetic sentences or similar sentences using the previously fine-tuned GPT-3, Figure 6 (b).We rerun paraphrase mining (Reimers and Gurevych, 2020) and set similar threshold (0.75 ≤ cosine score < 0.99) as our train dataset to further clean the generated test dataset.This filters out the noisy data, and acts as data quality verification on output of GPT-3.
We also generate 3 distinct time-series signals: periodic pulse, periodic spike and a typical timeseries pattern with random rise and fall.These are then randomly paired with the synthetic sentences (topic clusters sentences) of 3 topics generated using GPT-3.This generates a time-series of synthetic clusters over time.At each time point, the height of signal represents the N number of sentences associated with a cluster C, Figure 7.The sentences are then collated into random documents.The distribution of synthetic test sentences into documents is similar to the input Factiva's sentence distribution per document.Noise varied randomly between 0.1 to 0.3% of the number of sentences in the document.
We run our CANarEx pipeline and also the Relatio model on these sentences (collated into a random document) to extract narratives.The synthetic test data creation and evaluation pipeline is presented in Figure 5 (b).
Finally, we design synthetic data for 2 scenarios (see Table 2).One with co-reference resolved sentence pairs, and one without co-reference resolution.The latter synthetic data is constructed specifically to test Relatio as it does not do coreference resolution.The FIT-SNE (Linderman et al., 2019) approach is used to plot the lower dimensional representation of sentence embeddings of the Factiva dataset.

Evaluation pipeline
The overall idea of evaluation using the regression approach is to validate the number of time-series sentences retained by the model and if it recovers the temporal signature of the original timeseries sentences.We are interested in only how   We also do A/B testing and compare precision/recall scores for the recovery of the synthetic time-series sentences belonging to each cluster C1-C3.

Baseline model
We ran Relatio (baseline) with default settings on our synthetic narrative time-series clusters.The dimensionality reduction clustering for entities was evaluated for cluster counts 5, 10, 20, 30, 40, 50 and 100.We re-run clustering as downstream task on the dimensionality reduced narratives to recover the time-series narratives.We report the MSE results of this in comparison with our model.We report the comparison for 2 of the entity cluster counts of Relatio: 40 (best entity mapping as determined through manual audit) and 100 (default Relatio entity cluster count).We also compare our model to a null model.We generate the null model by randomly assigning cluster numbers to sentences and evaluating the MSE of such a model.
The results are presented in Table 3.The performance is tabulated per model variant, and per synthetic cluster for each model variant.The performance of CANarEx is better than both Relatio and the null model for each of the three clusters (lowest MSE across clusters).CANarEx uncovered 4 clusters, and Relatio identified 9 (for entity cluster 40) and 6 (for entity cluster 100) clusters.The MSE are presented for recovered clusters that have the lowest score in comparison to the original synthetic clusters.The recovery is presented in Figure 8 for the CANarEx model.A similar analysis with the inclusion of the filtering step is documented in Appendix B. Most of the noise was categorised into a separate cluster.
A/B test on percentage of sentences recovered across clusters between CANarEx and Relatio (ttest one-sided): +6.943 (p-value: 0.001).95% of the time, CANarEx recovers at least 26.8% more sentences than Relatio.The precision/recall scores are provided in Table 4 for the recovery of the synthetic time-series sentences belonging to each cluster C1-C3.

Ablation studies
Given the multi-component nature of our framework, we also evaluated the performance through an ablation study.The ablation study compared four configurations of CANarEx: with and without co-reference resolution, and with and without micro-narratives generation.Table 5 presents the evaluation of CANarEx for all extracted narratives.This evaluation does not include the optional filtering step, and exercises the framework for the four combinations of coreference resolution and micro-narratives generation.The performance of the framework when both co-reference resolution and micro-narratives generation are used is marginally worse than the best performing configuration (co-reference resolution only).The performance is markedly worse when co-reference resolution is not used.This is understandable as removing co-reference resolution removes all the subsequent mentions of an entity and therefore the extracted narratives are not contextual.Turning off co-reference resolution also has a cascading effect on the filtering mechanisms.Ablation studies of the optional filtering step are discussed in Appendix D.
The generation of micro-narratives increases the error rate, but only marginally.We expect this to be a consequence of the narrative splitting step introducing a degree of noise in the resulting set of micro-narratives.Secondly, the test data synthesis is geared to associate each sentence with only one cluster.In contrast, the CANarEx framework identifies more than one micro-narrative in some of the test sentences, and these micro-narratives can belong to different clusters, therefore penalizing MSE.Despite these observations, the micronarratives still retain the concepts, as demonstrated via the embedding space (see Figure 9).Therefore, the marginal increase in the error rate is an acceptable trade-off given the improved readability and recovery enabled through micro-narratives generation.

Qualitative analysis
We qualitatively evaluate the performance of CANarEx by analysing the micro-narratives it extracted for the topic of indigenous people from the synthetic text corpus.In particular, we examine the micro-narratives for the entity 'indigenous voice' within this corpus, as it is one of the central entities of the topic and is associated with multiple themes and narratives.Results of the extraction with TopN document level filtering are presented Figure 10 (Appendix E for results of CANarEx without filtering and with Textrank filtering).As evident, the extracted micro-narratives are all aligned with the public discourse on indigenous people in Australia.The verbs identified with indigenous voice are enshrine, have, need, provide, will be, would be and are linked with strong narratives like have strong national voice, have legislative voice, and enshrine (in the) constitution.The analysis also re-

Conclusion
In this work we introduced the CANarEx framework for contextually aware narrative extraction for text-as-data applications.The contextual aspects of the narrative are retained via the co-reference resolution step and by the use of the complete E-V-Es triplets for clustering.We also present an optional step that enables filtering of the extracted micro-narratives.We demonstrate the framework on two datasets -parliamentary proceedings and news articles.Further, we generate synthetic test data using a novel GPT-3 based technique for evaluation of the framework, and demonstrate better performance than the baseline model.The results are reinforced through an ablation study exercising the various components of the pipeline, and a qualitative study demonstrating the applicability of the framework.

Limitations
The CANarEx framework's performance is limited by the underlying models that it leverages for coreference resolution and SRL extraction, although we note that these models can be substituted with improved variants when available.Further evaluation is required to understand the performance of SRL extraction on corpuses enhanced with coreference resolution.An option would be to perform co-reference resolution and SRL simultaneously.SRL extraction can generate E-V-Es that are not narratives, and the generation of micronarratives introduces a degree of noise as well.We mitigate this through filtering and stop word removal, but a qualitative analysis of the resulting micro-narratives is an avenue for improvement.Finally, the clustering approach that we employ to reduce the number of E-V-Es to a viable subset can also remove valid E-V-Es through overtly aggressive filtering.While there is some penalty from clustering only the top sentences, Textrank performs better than document level clustering and the improvement is correlated with the ratio of sentences extracted.

B Appendix: Evaluation results with the filtering step
Tables B1 and B2 show the performance of CANarEx after applying the two filtering techniques (document level clustering and Textrank filtering techniques respectively).We note that the reduction of E-V-Es resulting from these techniques does penalize the MSE of CANarEx, although the performance continues to be better than the baseline model.D Appendix: Ablation study of the filtering step Continuing the ablation study with the inclusion of the optional filtering step, we observe that the same pattern observed in the main study applies here as well (see Table D1 for filtering through Textrank clustering, and Table D2 for filtering through TopN document level clustering).Finally, comparing along the dimension of filtering mechanisms, Textrank performs better than TopN document level clustering as the latter clips narratives aggressively.
E Appendix: Qualitative study The unfiltered set of micro-narratives is large and the link between the E-V-Es is more diffuse compared to the filtered sets.The filtering process has resulted in a clearer association between the entities of the unpruned E-V-Es.

Figure 2 :
Figure 2: Micro narratives extraction.Two hinge verbs are identified in the sentence, been (A) and proposed (B).By splitting, both verbs are identified (C) and coreferencing resolves correctly the entities to provide the final narratives (D).

Figure 3 :
Figure 3: The CANarEx framework (including filtering for top narratives)

Figure 4 :
Figure 4: CANarEx generated micro-narratives on topic of young families, derived from the Hansard dataset.
(a).Noise sentences were sampled from clusters other than the chosen topic clusters.Topic cluster sentences were sampled from the closest euclidean distance to centroid, t e x i t s h a 1 _ b a s e 6 4 = " v L J f O n J y 2 e N H c r J 6 / 6 b p r x 1 e r t g = " > A A A C J 3 i c b V B N S w M x E M 3 W 7 / q 1 6 t F L s A g e S t k t o l 4 U 0 Y t H x V Y L T S n Z N L W h 2 e y S z A p l 2 X / j x b / i R V A R P f p P z L Z 7 0 O p A y J s 3 b y a Z F 8 R S G P C 8 T 6 c 0 M z s 3 e w S t 4 0 5 6 0 F + 1 d + 5 i 3 l r R i 5 h D 8 g f b 5 A 2 7 B m q E = < / l a t e x i t >T = {0, 1, . . ., 500} C = {C1, t e x i t s h a 1 _ b a s e 6 4 = " V n V 3 u P B f 3 z X s 6 d m U T 7 E 8 V 5 G 7 F H c = " > A A A C H 3 i c b V D L S g M x F M 3 4 r P V V d e k m W A Q X p c w U q W 6 E o i 5 c V m o f 0 B m G T C b T h i a Z I c k I Z e i f u P F X 3 L h Q R N z 1 b 0 w f i L Y e C B zO u Y e b e 4 K E U a V t e 2 y t r K 6 t b 2 z m t v L b O 7 t 7 + 4 W D w 5 a K U 4 l J E 8 c s l p 0 A K c K o I E 1 N N S O d R B L E A 0 b a w e B m 4 r c f i V Q 0 F g 9 6 m B C P o 5 6 g y H u I 4 m w N p X m T Q n O 4 s n L p F U p O 9 V y 5 f 6 8 W L u e 1 5 E D x + A E n A E H X I A a u A N 1 0 A Q Y P I E X 8 A b e r W f r 1 f q w P m e j K 9 Y 8 c w T + w B p / A / w P n + Y = < / l a t e x i t > D = {d1, d2, . . ., dn} d = {S1, S2, . . ., Sm} time < l a t e x i t s h a 1 _ b a s e 6 4 = " P q s d 9 j H A u H j b 1 j 7 1 w B X J N 5 7 E 0 / A = " > A A A B 6 3 i c b V B N S w M x E M 3 W r 1 q / q h 6 9 B I t Q L 2 W 3 i H o s e v F Y w X 5 A u 5 R s m m 1 D k + y S z I p l 6 V / w 4 k E R r / 4 h b / 4 b s + 0 e t P X B w O O 9 G W b m B b H g B l z 3 2 y m s r W 9 s b h W 3 S z u 7 e / s H 5 c O j t o k S H W l D N J O 4 Y Z T n u x o l g E n D 4 G 0 9 v c f 3 y i S r N I P p g 0 p r 7 A Y 8 l C R r D J p b R u z o f V m t t w 5 0 C r x C t I D Q q 0 h 9 W v w S g i i a D S E I 6 1 7 n t u b P w M K 8 M I p 7 P K I N E 0 x m S K x 7 R v q c S C a j + b 3 z

Figure 5 :
Figure 5: Evaluation pipeline (a) Fine-tune GPT-3 using similar sentences and sample synthetic data from GPT-3 (b) Generating synthetic time-series cluster sentences and evaluating the pipeline.

Figure 6 :
Figure 6: Test clusters (original data) (a) Original factiva data (b) GPT3 fine-tuned data with noise The FIT-SNE (Linderman et al., 2019) approach is used to plot the lower dimensional representation of sentence embeddings of the Factiva dataset.

Figure 10 :
Figure 10: CANarEx generated micro-narratives on topic of indigenous voice on synthetic test data (filtered via TopN document level clustering).

C
Appendix: Factiva result exampleThe result of Textrank filtering on the topic of crime within the 'First nations' Factiva data is shown in Figures C1.The links between the E-V-Es are weighted by frequency of occurrence.This enables us to identify the dominant micro-narratives per entity pair, such as the higher occurrence of e1:mandatory sentence -v:produce -e2:result compared to other micro-narratives.

Figure C1 :
Figure C1: CANarEx generated micro-narratives on topic of criminal justice, derived from the Factiva dataset.

Figure E1 :
Figure E1: CANarEx generated micro-narratives on topic of indigenous voice on synthetic test data (unfiltered result set).

Figure E2 :
Figure E2: CANarEx generated micro-narratives on topic of indigenous voice on synthetic test data (filtered via Textrank).

Table 1 :
Factiva and Hansard results

Table 3 :
Evaluation results of CANarEx with baseline model (Relatio) and null model (full table in Appendix A, A1).

Table 4 :
Precision/Recall scores for the recovery of the synthetic time-series sentences belonging to each cluster.Cluster 4 here is assigned to noise.

Table 5 :
CANarEx performance over all of the extracted narratives from the synthetic narrative time-series data (full table in Appendix A, A2).

Table A1 :
Evaluation results of CANarEx with baseline model (Relatio) and null model

Table A2 :
CANarEx performance over all of the extracted narratives from the synthetic narrative time-series data.

Table B1 :
Evaluation results of CANarEx with Textrank filtering, compared to baseline model (Relatio)

Table B2 :
Evaluation results of CANarEx with TopN document level filtering, compared to baseline model (Relatio)

Table D1 :
CANarEx performance over the extracted narratives filtered through Textrank clustering.

Table D2 :
CANarEx performance over the extracted narratives filtered through TopN document level clustering.