Jack Collins


2024

pdf bib
Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR Proceedings
Mojtaba Yousefi | Jack Collins
Proceedings of the 1st Workshop on NLP for Science (NLP4Science)

This study examines the alignment of Conference on Computer Vision and Pattern Recognition (CVPR) research with the principles of the “bitter lesson” proposed by Rich Sutton. We analyze two decades of CVPR abstracts and titles using large language models (LLMs) to assess the field’s embracement of these principles. Our methodology leverages state-of-the-art natural language processing techniques to systematically evaluate the evolution of research approaches in computer vision. The results reveal significant trends in the adoption of general-purpose learning algorithms and the utilization of increased computational resources. We discuss the implications of these findings for the future direction of computer vision research and its potential impact on broader artificial intelligence development. This work contributes to the ongoing dialogue about the most effective strategies for advancing machine learning and computer vision, offering insights that may guide future research priorities and methodologies in the field.

2023

pdf bib
BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance
Karel D’Oosterlinck | François Remy | Johannes Deleu | Thomas Demeester | Chris Develder | Klim Zaporojets | Aneiss Ghodsi | Simon Ellershaw | Jack Collins | Christopher Potts
Findings of the Association for Computational Linguistics: EMNLP 2023

Timely and accurate extraction of Adverse Drug Events (ADE) from biomedical literature is paramount for public safety, but involves slow and costly manual labor. We set out to improve drug safety monitoring (pharmacovigilance, PV) through the use of Natural Language Processing (NLP). We introduce BioDEX, a large-scale resource for Biomedical adverse Drug Event eXtraction, rooted in the historical output of drug safety reporting in the U.S. BioDEX consists of 65k abstracts and 19k full-text biomedical papers with 256k associated document-level safety reports created by medical experts. The core features of these reports include the reported weight, age, and biological sex of a patient, a set of drugs taken by the patient, the drug dosages, the reactions experienced, and whether the reaction was life threatening. In this work, we consider the task of predicting the core information of the report given its originating paper. We estimate human performance to be 72.0% F1, whereas our best model achieves 59.1% F1 (62.3 validation), indicating significant headroom. We also begin to explore ways in which these models could help professional PV reviewers. Our code and data are available at https://github.com/KarelDO/BioDEX.