Michael Sheriff
2020
Automatic extraction of personal events from dialogue
Joshua Eisenberg
|
Michael Sheriff
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events
In this paper we introduce the problem of extracting events from dialogue. Previous work on event extraction focused on newswire, however we are interested in extracting events from spoken dialogue. To ground this study, we annotated dialogue transcripts from fourteen episodes of the podcast This American Life. This corpus contains 1,038 utterances, made up of 16,962 tokens, of which 3,664 represent events. The agreement for this corpus has a Cohen’s Kappa of 0.83. We have open-sourced this corpus for the NLP community. With this corpus in hand, we trained support vector machines (SVM) to correctly classify these phenomena with 0.68 F1, when using episode-fold cross-validation. This is nearly 100% higher F1 than the baseline classifier. The SVM models achieved performance of over 0.75 F1 on some testing folds. We report the results for SVM classifiers trained with four different types of features (verb classes, part of speech tags, named entities, and semantic role labels), and different machine learning protocols (under-sampling and trigram context). This work is grounded in narratology and computational models of narrative. It is useful for extracting events, plot, and story content from spoken dialogue.
Search