Prediction Models for Risk of Type-2 Diabetes Using Health Claims

Masatoshi Nagata, Kohichi Takai, Keiji Yasuda, Panikos Heracleous, Akio Yoneyama


Abstract
This study focuses on highly accurate prediction of the onset of type-2 diabetes. We investigated whether prediction accuracy can be improved by utilizing lab test data obtained from health checkups and incorporating health claim text data such as medically diagnosed diseases with ICD10 codes and pharmacy information. In a previous study, prediction accuracy was increased slightly by adding diagnosis disease name and independent variables such as prescription medicine. Therefore, in the current study we explored more suitable models for prediction by using state-of-the-art techniques such as XGBoost and long short-term memory (LSTM) based on recurrent neural networks. In the current study, text data was vectorized using word2vec, and the prediction model was compared with logistic regression. The results obtained confirmed that onset of type-2 diabetes can be predicted with a high degree of accuracy when the XGBoost model is used.
Anthology ID:
W18-2322
Volume:
Proceedings of the BioNLP 2018 workshop
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venues:
ACL | BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
172–176
Language:
URL:
https://aclanthology.org/W18-2322
DOI:
10.18653/v1/W18-2322
Bibkey:
Cite (ACL):
Masatoshi Nagata, Kohichi Takai, Keiji Yasuda, Panikos Heracleous, and Akio Yoneyama. 2018. Prediction Models for Risk of Type-2 Diabetes Using Health Claims. In Proceedings of the BioNLP 2018 workshop, pages 172–176, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Prediction Models for Risk of Type-2 Diabetes Using Health Claims (Nagata et al., 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-2322.pdf