Jean-Pierre Chevallet
2020
WIKIR: A Python Toolkit for Building a Large-scale Wikipedia-based English Information Retrieval Dataset
Jibril Frej
|
Didier Schwab
|
Jean-Pierre Chevallet
Proceedings of the Twelfth Language Resources and Evaluation Conference
Over the past years, deep learning methods allowed for new state-of-the-art results in ad-hoc information retrieval. However such methods usually require large amounts of annotated data to be effective. Since most standard ad-hoc information retrieval datasets publicly available for academic research (e.g. Robust04, ClueWeb09) have at most 250 annotated queries, the recent deep learning models for information retrieval perform poorly on these datasets. These models (e.g. DUET, Conv-KNRM) are trained and evaluated on data collected from commercial search engines not publicly available for academic research which is a problem for reproducibility and the advancement of research. In this paper, we propose WIKIR: an open-source toolkit to automatically build large-scale English information retrieval datasets based on Wikipedia. WIKIR is publicly available on GitHub. We also provide wikIR59k: a large-scale publicly available dataset that contains 59,252 queries and 2,617,003 (query, relevant documents) pairs.
2015
Recherche d’information sémantique : état des lieux [Semantic information retrieval: a state of the art]
Haïfa Zargayouna
|
Catherine Roussey
|
Jean-Pierre Chevallet
Traitement Automatique des Langues, Volume 56, Numéro 3 : Recherche d'Information [Information Retrieval]
2012
Constructing Reference Semantic Predictions from Biomedical Knowledge Sources
Demeke Ayele
|
Jean-Pierre Chevallet
|
Million Meshesha
|
Getnet Kassie
Proceedings of COLING 2012
Search
Co-authors
- Jibril Frej 1
- Didier Schwab 1
- Haïfa Zargayouna 1
- Catherine Roussey 1
- Demeke Ayele 1
- show all...