Aurore Bochnakian


pdf bib
Transitioning from benchmarks to a real-world case of information-seeking in Scientific Publications
Chyrine Tahri | Aurore Bochnakian | Patrick Haouat | Xavier Tannier
Findings of the Association for Computational Linguistics: ACL 2023

Although recent years have been marked by incredible advances in the whole development process of NLP systems, there are still blind spots in characterizing what is still hampering real-world adoption of models in knowledge-intensive settings. In this paper, we illustrate through a real-world zero-shot text search case for information seeking in scientific papers, the masked phenomena that the current process of measuring performance might not reflect, even when benchmarks are, in appearance, faithfully representative of the task at hand. In addition to experimenting with TREC-COVID and NFCorpus, we provide an industrial, expert-carried/annotated, case of studying vitamin B’s impact on health. We thus discuss the misalignment between solely focusing on single-metric performance as a criterion for model choice and relevancy as a subjective measure for meeting a user’s need.