Sabyasachee Baruah

2025

CHATTER: A character-attribution dataset for narrative understanding
Sabyasachee Baruah | Shrikanth Narayanan
Proceedings of the The 7th Workshop on Narrative Understanding

Computational narrative understanding studies the identification, description, and interaction of the elements of a narrative: characters, attributes, events, and relations.Narrative research has given considerable attention to defining and classifying character types.However, these character-type taxonomies do not generalize well because they are small, too simple, or specific to a domain.We require robust and reliable benchmarks to test whether narrative models truly understand the nuances of the character’s development in the story.Our work addresses this by curating the CHATTER dataset that labels whether a character portrays some attribute for 88124 character-attribute pairs, encompassing 2998 characters, 12967 attributes and 660 movies.We validate a subset of CHATTER, called CHATTEREVAL, using human annotations to serve as an evaluation benchmark for the character attribution task in movie scripts.CHATTEREVAL also assesses narrative understanding and the long-context modeling capacity of language models.

2023

pdf bib abs

Character Coreference Resolution in Movie Screenplays
Sabyasachee Baruah | Shrikanth Narayanan
Findings of the Association for Computational Linguistics: ACL 2023

Movie screenplays have a distinct narrative structure. It segments the story into scenes containing interleaving descriptions of actions, locations, and character dialogues.A typical screenplay spans several scenes and can include long-range dependencies between characters and events.A holistic document-level understanding of the screenplay requires several natural language processing capabilities, such as parsing, character identification, coreference resolution, action recognition, summarization, and attribute discovery. In this work, we develop scalable and robust methods to extract the structural information and character coreference clusters from full-length movie screenplays. We curate two datasets for screenplay parsing and character coreference — MovieParse and MovieCoref, respectively.We build a robust screenplay parser to handle inconsistencies in screenplay formatting and leverage the parsed output to link co-referring character mentions.Our coreference models can scale to long screenplay documents without drastically increasing their memory footprints.

2021

pdf bib

Annotation and Evaluation of Coreference Resolution in Screenplays
Sabyasachee Baruah | Sandeep Nallan Chakravarthula | Shrikanth Narayanan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs

A Simple Three-Step Approach for the Automatic Detection of Exaggerated Statements in Health Science News
Jasabanta Patro | Sabyasachee Baruah
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

There is a huge difference between a scientific journal reporting ‘wine consumption might be correlated to cancer’, and a media outlet publishing ‘wine causes cancer’ citing the journal’s results. The above example is a typical case of a scientific statement being exaggerated as an outcome of the rising problem of media manipulation. Given a pair of statements (say one from the source journal article and the other from the news article covering the results published in the journal), is it possible to ascertain with some confidence whether one is an exaggerated version of the other? This paper presents a surprisingly simple yet rational three-step approach that performs best for this task. We solve the task by breaking it into three sub-tasks as follows – (a) given a statement from a scientific paper or press release, we first extract relation phrases (e.g., ‘causes’ versus ‘might be correlated to’) connecting the dependent (e.g., ‘cancer’) and the independent (‘wine’) variable, (b) classify the strength of the relationship phrase extracted and (c) compare the strengths of the relation phrases extracted from the statements to identify whether one statement contains an exaggerated version of the other, and to what extent. Through rigorous experiments, we demonstrate that our simple approach by far outperforms baseline models that compare state-of-the-art embedding of the statement pairs through a binary classifier or recast the problem as a textual entailment task, which appears to be a very natural choice in this settings.

Co-authors

Venues

Fix author