Jakub Zavrel


2022

pdf bib
Multi-objective Representation Learning for Scientific Document Retrieval
Mathias Parisot | Jakub Zavrel
Proceedings of the Third Workshop on Scholarly Document Processing

Existing dense retrieval models for scientific documents have been optimized for either retrieval by short queries, or for document similarity, but usually not for both. In this paper, we explore the space of combining multiple objectives to achieve a single representation model that presents a good balance between both modes of dense retrieval, combining the relevance judgements from MS MARCO with the citation similarity of SPECTER, and the self-supervised objective of independent cropping. We also consider the addition of training data from document co-citation in a sentence context and domain-specific synthetic data. We show that combining multiple objectives yields models that generalize well across different benchmark tasks, improving up to 73% over models trained on a single objective.

2020

pdf bib
Effective distributed representations for academic expert search
Mark Berger | Jakub Zavrel | Paul Groth
Proceedings of the First Workshop on Scholarly Document Processing

Expert search aims to find and rank experts based on a user’s query. In academia, retrieving experts is an efficient way to navigate through a large amount of academic knowledge. Here, we study how different distributed representations of academic papers (i.e. embeddings) impact academic expert retrieval. We use the Microsoft Academic Graph dataset and experiment with different configurations of a document-centric voting model for retrieval. In particular, we explore the impact of the use of contextualized embeddings on search performance. We also present results for paper embeddings that incorporate citation information through retrofitting. Additionally, experiments are conducted using different techniques for assigning author weights based on author order. We observe that using contextual embeddings produced by a transformer model trained for sentence similarity tasks produces the most effective paper representations for document-centric expert retrieval. However, retrofitting the paper embeddings and using elaborate author contribution weighting strategies did not improve retrieval performance.

pdf bib
A New Neural Search and Insights Platform for Navigating and Organizing AI Research
Marzieh Fadaee | Olga Gureenkova | Fernando Rejon Barrera | Carsten Schnober | Wouter Weerkamp | Jakub Zavrel
Proceedings of the First Workshop on Scholarly Document Processing

To provide AI researchers with modern tools for dealing with the explosive growth of the research literature in their field, we introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature. The system provides search at multiple levels of textual granularity, from sentences to aggregations across documents, both in natural language and through navigation in a domain specific Knowledge Graph. We give an overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations.

2007

pdf bib
Learning to Compose Effective Strategies from a Library of Dialogue Components
Martijn Spitters | Marco De Boni | Jakub Zavrel | Remko Bonnema
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2001

pdf bib
Improving Accuracy in word class tagging through the Combination of Machine Learning Systems
Hans Van Halteren | Jakub Zavrel | Walter Daelemans
Computational Linguistics, Volume 27, Number 2, June 2001

2000

pdf bib
Genetic Algorithms for Feature Relevance Assignment in Memory-Based Language Processing
Anne Kool | Walter Daelemans | Jakub Zavrel
Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop

pdf bib
Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets
Sašo Džeroski | Tomaž Erjavec | Jakub Zavrel
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers
Jakub Zavrel | Walter Daelemans
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Part of Speech Tagging and Lemmatisation for the Spoken Dutch Corpus
Frank Van Eynde | Jakub Zavrel | Walter Daelemans
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf bib
Improving Data Driven Wordclass Tagging by System Combination
Hans van Halteren | Jakub Zavrel | Walter Daelemans
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Improving Data Driven Wordclass Tagging by System Combination
Hans van Halteren | Jakub Zavrel | Walter Daelemans
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

1997

pdf bib
Resolving PP attachment Ambiguities with Memory-Based Learning
Jakub Zavrel | Walter Daelemans | Jorn Veenstra
CoNLL97: Computational Natural Language Learning

pdf bib
Memory-Based Learning: Using Similarity for Smoothing
Jakub Zavrel | Walter Daelemans
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1996

pdf bib
MBT: A Memory-Based Part of Speech Tagger-Generator
Walter Daelemans | Jakub Zavrel | Peter Berck | Steven Gillis
Fourth Workshop on Very Large Corpora