CANarEx: Contextually Aware Narrative Extraction for Semantically Rich Text-as-data Applications

Nandini Anantharama; Simon Angus; Lachlan O’Neill

doi:10.18653/v1/2022.findings-emnlp.260

CANarEx: Contextually Aware Narrative Extraction for Semantically Rich Text-as-data Applications

Nandini Anantharama, Simon Angus, Lachlan O’Neill

Abstract

Narrative modelling is an area of active research, motivated by the acknowledgement of narratives as drivers of societal decision making. These research efforts conceptualize narratives as connected entity chains, and modeling typically focuses on the identification of entities and their connections within a text. An emerging approach to narrative modelling is the use of semantic role labeling (SRL) to extract Entity-Verb-Entity (E-V-Es) tuples from a text, followed by dimensionality reduction to reduce the space of entities and connections separately. This process penalises the semantic richness of narratives and discards much contextual information along the way. Here, we propose an alternate narrative extraction approach - CANarEx, incorporating a pipeline of common contextual constructs through co-reference resolution, micro-narrative generation and clustering of these narratives through sentence embeddings. We evaluate our approach through testing the recovery of “narrative time-series clusters”, mimicking a desirable text-as-data task. The evaluation framework leverages synthetic data generated using a GPT-3 model. The GPT-3 model is trained to generate similar sentences using a large dataset of news articles. The synthetic data maps to three topics in the news dataset. We then generate narrative time-series document cluster representations by mapping the synthetic data to three distinct signals synthetically injected into the testing corpus. Evaluation results demonstrate the superior ability of CANarEx to recover narrative time-series through reduced MSE and improved precision/recall relative to existing methods. The validity is further reinforced through ablation studies and qualitative analysis.

Anthology ID:: 2022.findings-emnlp.260
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3551–3564
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.260/
DOI:: 10.18653/v1/2022.findings-emnlp.260
Bibkey:
Cite (ACL):: Nandini Anantharama, Simon Angus, and Lachlan O’Neill. 2022. CANarEx: Contextually Aware Narrative Extraction for Semantically Rich Text-as-data Applications. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3551–3564, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: CANarEx: Contextually Aware Narrative Extraction for Semantically Rich Text-as-data Applications (Anantharama et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.260.pdf
Software:: 2022.findings-emnlp.260.software.zip
Dataset:: 2022.findings-emnlp.260.dataset.zip

PDF Cite Search Software Dataset Fix data