An Unsupervised Query Rewriting Approach Using N-gram Co-occurrence Statistics to Find Similar Phrases in Large Text Corpora
Hans Moen | Laura-Maria Peltonen | Henry Suhonen | Hanna-Maria Matinolli | Riitta Mieronkoski | Kirsi Telen | Kirsi Terho | Tapio Salakoski | Sanna Salanterä
Proceedings of the 22nd Nordic Conference on Computational Linguistics
We present our work towards developing a system that should find, in a large text corpus, contiguous phrases expressing similar meaning as a query phrase of arbitrary length. Depending on the use case, this task can be seen as a form of (phrase-level) query rewriting. The suggested approach works in a generative manner, is unsupervised and uses a combination of a semantic word n-gram model, a statistical language model and a document search engine. A central component is a distributional semantic model containing word n-grams vectors (or embeddings) which models semantic similarities between n-grams of different order. As data we use a large corpus of PubMed abstracts. The presented experiment is based on manual evaluation of extracted phrases for arbitrary queries provided by a group of evaluators. The results indicate that the proposed approach is promising and that the use of distributional semantic models trained with uni-, bi- and trigrams seems to work better than a more traditional unigram model.
Evaluation of a Prototype System that Automatically Assigns Subject Headings to Nursing Narratives Using Recurrent Neural Network
Hans Moen | Kai Hakala | Laura-Maria Peltonen | Henry Suhonen | Petri Loukasmäki | Tapio Salakoski | Filip Ginter | Sanna Salanterä
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
We present our initial evaluation of a prototype system designed to assist nurses in assigning subject headings to nursing narratives – written in the context of documenting patient care in hospitals. Currently nurses may need to memorize several hundred subject headings from standardized nursing terminologies when structuring and assigning the right section/subject headings to their text. Our aim is to allow nurses to write in a narrative manner without having to plan and structure the text with respect to sections and subject headings, instead the system should assist with the assignment of subject headings and restructuring afterwards. We hypothesize that this could reduce the time and effort needed for nursing documentation in hospitals. A central component of the system is a text classification model based on a long short-term memory (LSTM) recurrent neural network architecture, trained on a large data set of nursing notes. A simple Web-based interface has been implemented for user interaction. To evaluate the system, three nurses write a set of artificial nursing shift notes in a fully unstructured narrative manner, without planning for or consider the use of sections and subject headings. These are then fed to the system which assigns subject headings to each sentence and then groups them into paragraphs. Manual evaluation is conducted by a group of nurses. The results show that about 70% of the sentences are assigned to correct subject headings. The nurses believe that such a system can be of great help in making nursing documentation in hospitals easier and less time consuming. Finally, various measures and approaches for improving the system are discussed.
- Hans Moen 2
- Laura-Maria Peltonen 2
- Tapio Salakoski 2
- Sanna Salanterä 2
- Hanna-Maria Matinolli 1
- show all...