Heiko Paulheim


2024

pdf bib
Train Once, Use Flexibly: A Modular Framework for Multi-Aspect Neural News Recommendation
Andreea Iana | Goran Glavaš | Heiko Paulheim
Findings of the Association for Computational Linguistics: EMNLP 2024

Recent neural news recommenders (NNRs) extend content-based recommendation (1) by aligning additional aspects (e.g., topic, sentiment) between candidate news and user history or (2) by diversifying recommendations w.r.t. these aspects. This customization is achieved by ”hardcoding” additional constraints into the NNR’s architecture and/or training objectives: any change in the desired recommendation behavior thus requires retraining the model with a modified objective. This impedes widespread adoption of multi-aspect news recommenders. In this work, we introduce MANNeR, a modular framework for multi-aspect neural news recommendation that supports on-the-fly customization over individual aspects at inference time. With metric-based learning as its backbone, MANNeR learns aspect-specialized news encoders and then flexibly and linearly combines the resulting aspect-specific similarity scores into different ranking functions, alleviating the need for ranking function-specific retraining of the model. Extensive experimental results show that MANNeR consistently outperforms state-of-the-art NNRs on both standard content-based recommendation and single- and multi-aspect customization. Lastly, we validate that MANNeR’s aspect-customization module is robust to language and domain transfer.

2023

pdf bib
NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation
Andreea Iana | Goran Glavaš | Heiko Paulheim
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

NewsRecLib is an open-source library based on Pytorch-Lightning and Hydra developed for training and evaluating neural news recommendation models. The foremost goals of NewsRecLib are to promote reproducible research and rigorous experimental evaluation by (i) providing a unified and highly configurable framework for exhaustive experimental studies and (ii) enabling a thorough analysis of the performance contribution of different model architecture components and training regimes. NewsRecLib is highly modular, allows specifying experiments in a single configuration file, and includes extensive logging facilities. Moreover, NewsRecLib provides out-of-the-box implementations of several prominent neural models, training methods, standard evaluation benchmarks, and evaluation metrics for news recommendation.

2020

pdf bib
KGvec2go – Knowledge Graph Embeddings as a Service
Jan Portisch | Michael Hladik | Heiko Paulheim
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we present KGvec2go, a Web API for accessing and consuming graph embeddings in a light-weight fashion in downstream applications. Currently, we serve pre-trained embeddings for four knowledge graphs. We introduce the service and its usage, and we show further that the trained models have semantic value by evaluating them on multiple semantic benchmarks. The evaluation also reveals that the combination of multiple models can lead to a better outcome than the best individual model.

2016

pdf bib
A Large DataBase of Hypernymy Relations Extracted from the Web.
Julian Seitner | Christian Bizer | Kai Eckert | Stefano Faralli | Robert Meusel | Heiko Paulheim | Simone Paolo Ponzetto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Hypernymy relations (those where an hyponym term shares a “isa” relationship with his hypernym) play a key role for many Natural Language Processing (NLP) tasks, e.g. ontology learning, automatically building or extending knowledge bases, or word sense disambiguation and induction. In fact, such relations may provide the basis for the construction of more complex structures such as taxonomies, or be used as effective background knowledge for many word understanding applications. We present a publicly available database containing more than 400 million hypernymy relations we extracted from the CommonCrawl web corpus. We describe the infrastructure we developed to iterate over the web corpus for extracting the hypernymy relations and store them effectively into a large database. This collection of relations represents a rich source of knowledge and may be useful for many researchers. We offer the tuple dataset for public download and an Application Programming Interface (API) to help other researchers programmatically query the database.

pdf bib
Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job
Marieke van Erp | Pablo Mendes | Heiko Paulheim | Filip Ilievski | Julien Plu | Giuseppe Rizzo | Joerg Waitelonis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Entity linking has become a popular task in both natural language processing and semantic web communities. However, we find that the benchmark datasets for entity linking tasks do not accurately evaluate entity linking systems. In this paper, we aim to chart the strengths and weaknesses of current benchmark datasets and sketch a roadmap for the community to devise better benchmark datasets.