Arezoo Hatefi
2024
PromptStream: Self-Supervised News Story Discovery Using Topic-Aware Article Representations
Arezoo Hatefi
|
Anton Eklund
|
Mona Forsman
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Given the importance of identifying and monitoring news stories within the continuous flow of news articles, this paper presents PromptStream, a novel method for unsupervised news story discovery. In order to identify coherent and comprehensive stories across the stream, it is crucial to create article representations that incorporate as much topic-related information from the articles as possible. PromptStream constructs these article embeddings using cloze-style prompting. These representations continually adjust to the evolving context of the news stream through self-supervised learning, employing a contrastive loss and a memory of the most confident article-story assignments from the most recent days. Extensive experiments with real news datasets highlight the notable performance of our model, establishing a new state of the art. Additionally, we delve into selected news stories to reveal how the model’s structuring of the article stream aligns with story progression.
2023
ADCluster: Adaptive Deep Clustering for Unsupervised Learning from Unlabeled Documents
Arezoo Hatefi
|
Xuan-Son Vu
|
Monowar Bhuyan
|
Frank Drewes
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)