Evaluation Set for Slovak News Information Retrieval

Daniel Hládek, Jan Staš, Jozef Juhár


Abstract
This work proposes an information retrieval evaluation set for the Slovak language. A set of 80 queries written in the natural language is given together with the set of relevant documents. The document set contains 3980 newspaper articles sorted into 6 categories. Each document in the result set is manually annotated for relevancy with its corresponding query. The evaluation set is mostly compatible with the Cranfield test collection using the same methodology for queries and annotation of relevancy. In addition to that it provides annotation for document title, author, publication date and category that can be used for evaluation of automatic document clustering and categorization.
Anthology ID:
L16-1302
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1913–1916
Language:
URL:
https://aclanthology.org/L16-1302
DOI:
Bibkey:
Cite (ACL):
Daniel Hládek, Jan Staš, and Jozef Juhár. 2016. Evaluation Set for Slovak News Information Retrieval. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1913–1916, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Evaluation Set for Slovak News Information Retrieval (Hládek et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1302.pdf