Tim Gollub
2025
From Keyterms to Context: Exploring Topic Description Generation in Scientific Corpora
Pierre Achkar | Satiyabooshan Murugaboopathy | Anne Kreuter | Tim Gollub | Martin Potthast | Yuri Campbell
Proceedings of The 5th New Frontiers in Summarization Workshop
Pierre Achkar | Satiyabooshan Murugaboopathy | Anne Kreuter | Tim Gollub | Martin Potthast | Yuri Campbell
Proceedings of The 5th New Frontiers in Summarization Workshop
Topic models represent topics as ranked term lists, which are often hard to interpret in scientific domains. We explore Topic Description for Scientific Corpora, an approach to generating structured summaries for topic-specific document sets. We propose and investigate two LLM-based pipelines: Selective Context Summarisation (SCS), which uses maximum marginal relevance to select representative documents; and Compressed Context Summarisation (CCS), a hierarchical approach that compresses document sets through iterative summarisation. We evaluate both methods using SUPERT and multi-model LLM-as-a-Judge across three topic modeling backbones and three scientific corpora. Our preliminary results suggest that SCS tends to outperform CCS in quality and robustness, while CCS shows potential advantages on larger topics. Our findings highlight interesting trade-offs between selective and compressed strategies for topic-level summarisation in scientific domains. We release code and data for two of the three datasets.
2023
Webis @ ImageArg 2023: Embedding-based Stance and Persuasiveness Classification
Islam Torky | Simon Ruth | Shashi Sharma | Mohamed Salama | Krishna Chaitanya | Tim Gollub | Johannes Kiesel | Benno Stein
Proceedings of the 10th Workshop on Argument Mining
Islam Torky | Simon Ruth | Shashi Sharma | Mohamed Salama | Krishna Chaitanya | Tim Gollub | Johannes Kiesel | Benno Stein
Proceedings of the 10th Workshop on Argument Mining
This paper reports on the submissions of Webis to the two subtasks of ImageArg 2023. For the subtask of argumentative stance classification, we reached an F1 score of 0.84 using a BERT model for sequence classification. For the subtask of image persuasiveness classification, we reached an F1 score of 0.56 using CLIP embeddings and a neural network model, achieving the best performance for this subtask in the competition. Our analysis reveals that seemingly clear sentences (e.g., “I support gun control”) are still problematic for our otherwise competitive stance classifier and that ignoring the tweet text for image persuasiveness prediction leads to a model that is similarly effective to our top-performing model.
SemEval-2023 Task 5: Clickbait Spoiling
Maik Fröbe | Benno Stein | Tim Gollub | Matthias Hagen | Martin Potthast
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Maik Fröbe | Benno Stein | Tim Gollub | Matthias Hagen | Martin Potthast
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
In this overview paper, we report on the second PAN~Clickbait Challenge hosted as Task~5 at SemEval~2023. The challenge’s focus is to better support social media users by automatically generating short spoilers that close the curiosity gap induced by a clickbait post. We organized two subtasks: (1) spoiler type classification to assess what kind of spoiler a clickbait post warrants (e.g., a phrase), and (2) spoiler generation to generate an actual spoiler for a clickbait post.
2018
Crowdsourcing a Large Corpus of Clickbait on Twitter
Martin Potthast | Tim Gollub | Kristof Komlossy | Sebastian Schuster | Matti Wiegmann | Erika Patricia Garces Fernandez | Matthias Hagen | Benno Stein
Proceedings of the 27th International Conference on Computational Linguistics
Martin Potthast | Tim Gollub | Kristof Komlossy | Sebastian Schuster | Matti Wiegmann | Erika Patricia Garces Fernandez | Matthias Hagen | Benno Stein
Proceedings of the 27th International Conference on Computational Linguistics
Clickbait has become a nuisance on social media. To address the urging task of clickbait detection, we constructed a new corpus of 38,517 annotated Twitter tweets, the Webis Clickbait Corpus 2017. To avoid biases in terms of publisher and topic, tweets were sampled from the top 27 most retweeted news publishers, covering a period of 150 days. Each tweet has been annotated on 4-point scale by five annotators recruited at Amazon’s Mechanical Turk. The corpus has been employed to evaluate 12 clickbait detectors submitted to the Clickbait Challenge 2017. Download: https://webis.de/data/webis-clickbait-17.html Challenge: https://clickbait-challenge.org