Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultilingualBIO 2020)
Maite Melero (Editor)
Electronic Health Records are a valuable source of patient information which can be leveraged to detect Adverse Drug Events (ADEs) and aid post-mark drug-surveillance. The overall aim of this study is to scrutinize text written by clinicians in the EHRs and build a model for ADE detection that produces medically relevant predictions. Natural Language Processing techniques will be exploited to create important predictors and incorporate them into the learning process. The study focuses on the 5 most frequent ADE cases found ina Swedish electronic patient record corpus. The results indicate that considering textual features, rather than the structured, can improve the classification performance by 15% in some ADE cases. Additionally, variable patient history lengths are incorporated in the models, demonstrating the importance of the above decision rather than using an arbitrary number for a history length. The experimental findings suggest that the clinical text in EHRs includes information that can capture data beyond the ones that are found in a structured format.
We present a large Norwegian lexical resource of categorized medical terms. The resource, which merges information from large medical databases, contains over 56,000 entries, including automatically mapped terms from a Norwegian medical dictionary. We describe the methodology behind this automatic dictionary entry mapping based on keywords and suffixes and further present the results of a manual evaluation performed on a subset by a domain expert. The evaluation indicated that ca. 80% of the mappings were correct.
Medical language exhibits great variations regarding users, institutions and language registers. With large parts of clinical documents in free text, NLP is playing a more and more important role in unlocking re-usable and interoperable meaning from medical records. This study describes the architectural principles and the evolution of a German interface vocabulary, combining machine translation with human annotation and rule-based term generation, yielding a resource with 7.7 million raw entries, each of which linked to the reference terminology SNOMED CT, an international standard with about 350 thousand concepts. The purpose is to offer a high coverage of medical jargon in order to optimise terminology grounding of clinical texts by text mining systems. The core resource is a manually curated table of English-to-German word and chunk translations, supported by a set of language generation rules. The work describes a workflow consisting the enrichment and modification of this table with human and machine efforts, manually enriched by grammarspecific tags. Top-down and bottom-up methods for terminology population used in parallel. The final interface terms are produced by a term generator, which creates one-to-many German variants per SNOMED CT English description. Filtering against a large collection of domain terminologies and corpora drastically reduces the size of the vocabulary in favour of more realistic terms or terms that can reasonably be expected to match clinical text passages within a text-mining pipeline. An evaluation was performed by a comparison between the current version of the German interface vocabulary and the English description table of the SNOMED CT International release. An exact term matching was performed with a small parallel corpus constituted by text snippets from different clinical documents. With overall low retrieval parameters (with F-values around 30%), the performance of the German language scenario reaches 80 – 90% of the English one. Interestingly, annotations are slightly better with machine-translated (German – English) texts, using the International SNOMED CT resource only.
Translating biomedical ontologies is an important challenge, but doing it manually requires much time and money. We study the possibility to use open-source knowledge bases to translate biomedical ontologies. We focus on two aspects: coverage and quality. We look at the coverage of two biomedical ontologies focusing on diseases with respect to Wikidata for 9 European languages (Czech, Dutch, English, French, German, Italian, Polish, Portuguese and Spanish) for both, plus Arabic, Chinese and Russian for the second. We first use direct links between Wikidata and the studied ontologies and then use second-order links by going through other intermediate ontologies. We then compare the quality of the translations obtained thanks to Wikidata with a commercial machine translation tool, here Google Cloud Translation.
Transfer learning applied to text classification in Spanish radiological reports
Pilar López Úbeda | Manuel Carlos Díaz-Galiano | L. Alfonso Urena Lopez | Maite Martin | Teodoro Martín-Noguerol | Antonio Luna
Pre-trained text encoders have rapidly advanced the state-of-the-art on many Natural Language Processing tasks. This paper presents the use of transfer learning methods applied to the automatic detection of codes in radiological reports in Spanish. Assigning codes to a clinical document is a popular task in NLP and in the biomedical domain. These codes can be of two types: standard classifications (e.g. ICD-10) or specific to each clinic or hospital. In this study we show a system using specific radiology clinic codes. The dataset is composed of 208,167 radiology reports labeled with 89 different codes. The corpus has been evaluated with three methods using the BERT model applied to Spanish: Multilingual BERT, BETO and XLM. The results are interesting obtaining 70% of F1-score with a pre-trained multilingual model.
The Platform for Automated extraction of animal Disease Information from the web (PADI-web) is an automated system which monitors the web for monitoring and detecting emerging animal infectious diseases. The tool automatically collects news via customised multilingual queries, classifies them and extracts epidemiological information. We detail the processing of multilingual online sources by PADI-web and analyse the translated outputs in a case study