%0 Conference Proceedings %T Multilingual Clustering of Streaming News %A Miranda, Sebastião %A Znotiņš, Artūrs %A Cohen, Shay B. %A Barzdins, Guntis %Y Riloff, Ellen %Y Chiang, David %Y Hockenmaier, Julia %Y Tsujii, Jun’ichi %S Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing %D 2018 %8 oct nov %I Association for Computational Linguistics %C Brussels, Belgium %F miranda-etal-2018-multilingual %X Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual clusters. Unlike typical clustering approaches that report results on datasets with a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. In our formulation, the monolingual clusters group together documents while the crosslingual clusters group together monolingual clusters, one per language that appears in the stream. Our method is simple to implement, computationally efficient and produces state-of-the-art results on datasets in German, English and Spanish. %R 10.18653/v1/D18-1483 %U https://aclanthology.org/D18-1483 %U https://doi.org/10.18653/v1/D18-1483 %P 4535-4544