Alex Papadopoulos Korfiatis


2022

pdf bib
Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation
Francesco Moramarco | Alex Papadopoulos Korfiatis | Mark Perera | Damir Juric | Jack Flann | Ehud Reiter | Anya Belz | Aleksandar Savkov
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In recent years, machine learning models have rapidly become better at generating clinical consultation notes; yet, there is little work on how to properly evaluate the generated consultation notes to understand the impact they may have on both the clinician using them and the patient’s clinical safety.To address this we present an extensive human evaluation study of consultation notes where 5 clinicians (i) listen to 57 mock consultations, (ii) write their own notes, (iii) post-edit a number of automatically generated notes, and (iv) extract all the errors, both quantitative and qualitative. We then carry out a correlation study with 18 automatic quality metrics and the human judgements. We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore. All our findings and annotations are open-sourced.

pdf bib
PriMock57: A Dataset Of Primary Care Mock Consultations
Alex Papadopoulos Korfiatis | Francesco Moramarco | Radmila Sarac | Aleksandar Savkov
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Recent advances in Automatic Speech Recognition (ASR) have made it possible to reliably produce automatic transcripts of clinician-patient conversations. However, access to clinical datasets is heavily restricted due to patient privacy, thus slowing down normal research practices. We detail the development of a public access, high quality dataset comprising of 57 mocked primary care consultations, including audio recordings, their manual utterance-level transcriptions, and the associated consultation notes. Our work illustrates how the dataset can be used as a benchmark for conversational medical ASR as well as consultation note generation from transcripts.

2021

pdf bib
A Preliminary Study on Evaluating Consultation Notes With Post-Editing
Francesco Moramarco | Alex Papadopoulos Korfiatis | Aleksandar Savkov | Ehud Reiter
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

Automatic summarisation has the potential to aid physicians in streamlining clerical tasks such as note taking. But it is notoriously difficult to evaluate these systems and demonstrate that they are safe to be used in a clinical setting. To circumvent this issue, we propose a semi-automatic approach whereby physicians post-edit generated notes before submitting them. We conduct a preliminary study on the time saving of automatically generated consultation notes with post-editing. Our evaluators are asked to listen to mock consultations and to post-edit three generated notes. We time this and find that it is faster than writing the note from scratch. We present insights and lessons learnt from this experiment.

2019

pdf bib
Multilingual Factor Analysis
Francisco Vargas | Kamen Brestnichki | Alex Papadopoulos Korfiatis | Nils Hammerla
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this work we approach the task of learning multilingual word representations in an offline manner by fitting a generative latent variable model to a multilingual dictionary. We model equivalent words in different languages as different views of the same word generated by a common latent variable representing their latent lexical meaning. We explore the task of alignment by querying the fitted model for multilingual embeddings achieving competitive results across a variety of tasks. The proposed model is robust to noise in the embedding space making it a suitable method for distributed representations learned from noisy corpora.