Unsupervised Event Clustering and Aggregation from Newswire and Web Articles

Swen Ribeiro, Olivier Ferret, Xavier Tannier


Abstract
In this paper, we present an unsupervised pipeline approach for clustering news articles based on identified event instances in their content. We leverage press agency newswire and monolingual word alignment techniques to build meaningful and linguistically varied clusters of articles from the web in the perspective of a broader event type detection task. We validate our approach on a manually annotated corpus of Web articles.
Anthology ID:
W17-4211
Volume:
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Octavian Popescu, Carlo Strapparava
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–67
Language:
URL:
https://aclanthology.org/W17-4211
DOI:
10.18653/v1/W17-4211
Bibkey:
Cite (ACL):
Swen Ribeiro, Olivier Ferret, and Xavier Tannier. 2017. Unsupervised Event Clustering and Aggregation from Newswire and Web Articles. In Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism, pages 62–67, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Event Clustering and Aggregation from Newswire and Web Articles (Ribeiro et al., 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-4211.pdf