Klaar Vanopstal


2010

pdf bib
Assessing the Impact of English Language Skills and Education Level on PubMed Searches by Dutch-speaking Users
Klaar Vanopstal | Robert Vander Stichele | Godelieve Laureys | Joost Buysschaert
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The aim of this study was to assess the retrieval effectiveness of nursing students in the Dutch-speaking part of Belgium. We tested two groups: students from the master of Nursing and Midwifery training, and students from the bachelor of Nursing program. The test consisted of five parts: first, the students completed an enquiry about their computer skills, experiences with PubMed and how they assessed their own language skills. Secondly, an introduction into the use of MeSH in PubMed was given, followed by a PubMed search. After the literature search, a second enquiry was completed in which the students were asked to give their opinion about the test. To conclude, an official language test was completed. The results of the PubMed search, i.e. a list of articles the students deemed relevant for a particular question, were compared to a gold standard. Precision, recall and F-score were calculated in order to evaluate the efficiency of the PubMed search. We used information from the search process, such as search term formulation and MeSH term selection to evaluate the search process and examined their relationship with the results of the language test and the level of education.

pdf bib
Towards a Learning Approach for Abbreviation Detection and Resolution.
Klaar Vanopstal | Bart Desmet | Véronique Hoste
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The explosion of biomedical literature and with it the -uncontrolled- creation of abbreviations presents some special challenges for both human readers and computer applications. We developed an annotated corpus of Dutch medical text, and experimented with two approaches to abbreviation detection and resolution. Our corpus is composed of abstracts from two medical journals from the Low Countries in which approximately 65 percent (NTvG) and 48 percent (TvG) of the abbreviations have a corresponding full form in the abstract. Our first approach, a pattern-based system, consists of two steps: abbreviation detection and definition matching. This system has an average F-score of 0.82 for the detection of both defined and undefined abbreviations and an average F-score of 0.77 was obtained for the definitions. For our second approach, an SVM-based classifier was used on the preprocessed data sets, leading to an average F-score of 0.93 for the abbreviations; for the definitions an average F-score of 0.82 was obtained.

2008

pdf bib
Learning-based Detection of Scientific Terms in Patient Information
Veronique Hoste | Els Lefever | Klaar Vanopstal | Isabelle Delaere
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we investigate the use of a machine-learning based approach to the specific problem of scientific term detection in patient information. Lacking lexical databases which differentiate between the scientific and popular nature of medical terms, we used local context, morphosyntactic, morphological and statistical information to design a learner which accurately detects scientific medical terms. This study is the first step towards the automatic replacement of a scientific term by its popular counterpart, which should have a beneficial effect on readability. We show a F-score of 84% for the prediction of scientific terms in an English and Dutch EPAR corpus. Since recasting the term extraction problem as a classification problem leads to a large skewedness of the resulting data set, we rebalanced the data set through the application of some simple TF-IDF-based and Log-likelihood-based filters. We show that filtering indeed has a beneficial effect on the learner’s performance. However, the results of the filtering approach combined with the learning-based approach remain below those of the learning-based approach.