Iterative development of family history annotation guidelines using a synthetic corpus of clinical text

Taraka Rama, Pål Brekke, Øystein Nytrø, Lilja Øvrelid


Abstract
In this article, we describe the development of annotation guidelines for family history information in Norwegian clinical text. We make use of incrementally developed synthetic clinical text describing patients’ family history relating to cases of cardiac disease and present a general methodology which integrates the synthetically produced clinical statements and guideline development. We analyze inter-annotator agreement based on the developed guidelines and present results from experiments aimed at evaluating the validity and applicability of the annotated corpus using machine learning techniques. The resulting annotated corpus contains 477 sentences and 6030 tokens. Both the annotation guidelines and the annotated corpus are made freely available and as such constitutes the first publicly available resource of Norwegian clinical text.
Anthology ID:
W18-5613
Volume:
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Alberto Lavelli, Anne-Lyse Minard, Fabio Rinaldi
Venue:
Louhi
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
111–121
Language:
URL:
https://aclanthology.org/W18-5613
DOI:
10.18653/v1/W18-5613
Bibkey:
Cite (ACL):
Taraka Rama, Pål Brekke, Øystein Nytrø, and Lilja Øvrelid. 2018. Iterative development of family history annotation guidelines using a synthetic corpus of clinical text. In Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, pages 111–121, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Iterative development of family history annotation guidelines using a synthetic corpus of clinical text (Rama et al., Louhi 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-5613.pdf