Unsupervised Event Clustering and Aggregation from Newswire and Web Articles
Swen Ribeiro | Olivier Ferret | Xavier Tannier
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
In this paper, we present an unsupervised pipeline approach for clustering news articles based on identified event instances in their content. We leverage press agency newswire and monolingual word alignment techniques to build meaningful and linguistically varied clusters of articles from the web in the perspective of a broader event type detection task. We validate our approach on a manually annotated corpus of Web articles.