Anna Koroleva


2020

pdf bib
DeSpin: a prototype system for detecting spin in biomedical publications
Anna Koroleva | Sanjay Kamath | Patrick Bossuyt | Patrick Paroubek
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

Improving the quality of medical research reporting is crucial to reduce avoidable waste in research and to improve the quality of health care. Despite various initiatives aiming at improving research reporting – guidelines, checklists, authoring aids, peer review procedures, etc. – overinterpretation of research results, also known as spin, is still a serious issue in research reporting. In this paper, we propose a Natural Language Processing (NLP) system for detecting several types of spin in biomedical articles reporting randomized controlled trials (RCTs). We use a combination of rule-based and machine learning approaches to extract important information on trial design and to detect potential spin. The proposed spin detection system includes algorithms for text structure analysis, sentence classification, entity and relation extraction, semantic similarity assessment. Our algorithms achieved operational performance for the these tasks, F-measure ranging from 79,42 to 97.86% for different tasks. The most difficult task is extracting reported outcomes. Our tool is intended to be used as a semi-automated aid tool for assisting both authors and peer reviewers to detect potential spin. The tool incorporates a simple interface that allows to run the algorithms and visualize their output. It can also be used for manual annotation and correction of the errors in the outputs. The proposed tool is the first tool for spin detection. The tool and the annotated dataset are freely available.

2019

pdf bib
Extracting relations between outcomes and significance levels in Randomized Controlled Trials (RCTs) publications
Anna Koroleva | Patrick Paroubek
Proceedings of the 18th BioNLP Workshop and Shared Task

Randomized controlled trials assess the effects of an experimental intervention by comparing it to a control intervention with regard to some variables - trial outcomes. Statistical hypothesis testing is used to test if the experimental intervention is superior to the control. Statistical significance is typically reported for the measured outcomes and is an important characteristic of the results. We propose a machine learning approach to automatically extract reported outcomes, significance levels and the relation between them. We annotated a corpus of 663 sentences with 2,552 outcome - significance level relations (1,372 positive and 1,180 negative relations). We compared several classifiers, using a manually crafted feature set, and a number of deep learning models. The best performance (F-measure of 94%) was shown by the BioBERT fine-tuned model.

2018

pdf bib
Annotating Spin in Biomedical Scientific Publications : the case of Random Controlled Trials (RCTs)
Anna Koroleva | Patrick Paroubek
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Vers détection automatique des affirmations inappropriées dans les articles scientifiques (Towards automatic detection of inadequate claims in scientific articles)
Anna Koroleva
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. 19es REncontres jeunes Chercheurs en Informatique pour le TAL (RECITAL 2017)

Dans cet article nous considérons l’apport du Traitement Automatique des Langues (TAL) au problème de la détection automatique de « l’embellissement » (en anglais « spin ») des résultats de recherche dans les publications scientifiques du domaine biomédical. Nous cherchons à identifier les affirmations inappropriées dans les articles, c’est-à-dire les affirmations où l’effet positif du traitement étudié est plus grand que celui effectivement prouvé par la recherche. Après une description du problème de point de vue du TAL, nous présentons les pistes de recherche qui nous semblent les plus prometteuses pour automatiser la détection de l’embellissement. Ensuite nous analysons l’état de l’art sur les tâches comparables et présentons les premiers résultats obtenus dans notre projet avec des méthodes de base (grammaires locales) pour la tâche de l’extraction des entités spécifiques à notre objectif.