Kai Hakala


2019

pdf bib
Biomedical Named Entity Recognition with Multilingual BERT
Kai Hakala | Sampo Pyysalo
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

We present the approach of the Turku NLP group to the PharmaCoNER task on Spanish biomedical named entity recognition. We apply a CRF-based baseline approach and multilingual BERT to the task, achieving an F-score of 88% on the development data and 87% on the test set with BERT. Our approach reflects a straightforward application of a state-of-the-art multilingual model that is not specifically tailored to either the language nor the application domain. The source code is available at: https://github.com/chaanim/pharmaconer

2018

pdf bib
Evaluation of a Prototype System that Automatically Assigns Subject Headings to Nursing Narratives Using Recurrent Neural Network
Hans Moen | Kai Hakala | Laura-Maria Peltonen | Henry Suhonen | Petri Loukasmäki | Tapio Salakoski | Filip Ginter | Sanna Salanterä
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

We present our initial evaluation of a prototype system designed to assist nurses in assigning subject headings to nursing narratives – written in the context of documenting patient care in hospitals. Currently nurses may need to memorize several hundred subject headings from standardized nursing terminologies when structuring and assigning the right section/subject headings to their text. Our aim is to allow nurses to write in a narrative manner without having to plan and structure the text with respect to sections and subject headings, instead the system should assist with the assignment of subject headings and restructuring afterwards. We hypothesize that this could reduce the time and effort needed for nursing documentation in hospitals. A central component of the system is a text classification model based on a long short-term memory (LSTM) recurrent neural network architecture, trained on a large data set of nursing notes. A simple Web-based interface has been implemented for user interaction. To evaluate the system, three nurses write a set of artificial nursing shift notes in a fully unstructured narrative manner, without planning for or consider the use of sections and subject headings. These are then fed to the system which assigns subject headings to each sentence and then groups them into paragraphs. Manual evaluation is conducted by a group of nurses. The results show that about 70% of the sentences are assigned to correct subject headings. The nurses believe that such a system can be of great help in making nursing documentation in hospitals easier and less time consuming. Finally, various measures and approaches for improving the system are discussed.

2017

pdf bib
End-to-End System for Bacteria Habitat Extraction
Farrokh Mehryary | Kai Hakala | Suwisa Kaewphan | Jari Björne | Tapio Salakoski | Filip Ginter
BioNLP 2017

We introduce an end-to-end system capable of named-entity detection, normalization and relation extraction for extracting information about bacteria and their habitats from biomedical literature. Our system is based on deep learning, CRF classifiers and vector space models. We train and evaluate the system on the BioNLP 2016 Shared Task Bacteria Biotope data. The official evaluation shows that the joint performance of our entity detection and relation extraction models outperforms the winning team of the Shared Task by 19pp on F1-score, establishing a new top score for the task. We also achieve state-of-the-art results in the normalization task. Our system is open source and freely available at https://github.com/TurkuNLP/BHE.

pdf bib
Detecting mentions of pain and acute confusion in Finnish clinical text
Hans Moen | Kai Hakala | Farrokh Mehryary | Laura-Maria Peltonen | Tapio Salakoski | Filip Ginter | Sanna Salanterä
BioNLP 2017

We study and compare two different approaches to the task of automatic assignment of predefined classes to clinical free-text narratives. In the first approach this is treated as a traditional mention-level named-entity recognition task, while the second approach treats it as a sentence-level multi-label classification task. Performance comparison across these two approaches is conducted in the form of sentence-level evaluation and state-of-the-art methods for both approaches are evaluated. The experiments are done on two data sets consisting of Finnish clinical text, manually annotated with respect to the topics pain and acute confusion. Our results suggest that the mention-level named-entity recognition approach outperforms sentence-level classification overall, but the latter approach still manages to achieve the best prediction scores on several annotation classes.

2016

pdf bib
Syntactic analyses and named entity recognition for PubMed and PubMed Central — up-to-the-minute
Kai Hakala | Suwisa Kaewphan | Tapio Salakoski | Filip Ginter
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

2015

pdf bib
UTU: Adapting Biomedical Event Extraction System to Disorder Attribute Detection
Kai Hakala
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Sharing annotations better: RESTful Open Annotation
Sampo Pyysalo | Jorge Campos | Juan Miguel Cejuela | Filip Ginter | Kai Hakala | Chen Li | Pontus Stenetorp | Lars Juhl Jensen
Proceedings of ACL-IJCNLP 2015 System Demonstrations

2014

pdf bib
UTU: Disease Mention Recognition and Normalization with CRFs and Vector Space Representations
Suwisa Kaewphan | Kai Hakala | Filip Ginter
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
EVEX in ST’13: Application of a large-scale text mining resource to event extraction and network construction
Kai Hakala | Sofie Van Landeghem | Tapio Salakoski | Yves Van de Peer | Filip Ginter
Proceedings of the BioNLP Shared Task 2013 Workshop