Harald Sack


2024

pdf bib
How to Turn Card Catalogs into LLM Fodder
Mary Ann Tan | Shufan Jiang | Harald Sack
Proceedings of the Workshop on Deep Learning and Linked Data (DLnLD) @ LREC-COLING 2024

Bibliographical metadata collections describing pre-modern objects suffer from incompleteness and inaccuracies. This hampers the identification of literary works. In addition, titles often contain voluminous descriptive texts that do not adhere to contemporary title conventions. This paper explores several NLP approaches where greater textual length in titles is leveraged to enhance descriptive information.

2016

pdf bib
Crowdsourced Corpus with Entity Salience Annotations
Milan Dojchinovski | Dinesh Reddy | Tomáš Kliegr | Tomáš Vitvar | Harald Sack
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we present a crowdsourced dataset which adds entity salience (importance) annotations to the Reuters-128 dataset, which is subset of Reuters-21578. The dataset is distributed under a free license and publish in the NLP Interchange Format, which fosters interoperability and re-use. We show the potential of the dataset on the task of learning an entity salience classifier and report on the results from several experiments.