TY - GEN
T1 - Developing disease risk prediction model based on environmental factors
AU - Pak, Mingyu
AU - Shin, Miyoung
PY - 2014
Y1 - 2014
N2 - Analyzing the effects of various environmental factors on human diseases is one of the important issues in recent bioinformatics studies. In this paper we investigate several environmental factors regarding Type-2 diabetes and select some of them for develop an analytical model of disease risk prediction. For the selection of significant factors, we first preprocessed all the environmental factors into categorical values and then calculated the max/min odds ratios of all the categorized environmental factors. After that, we chose the top-n ranked factors as input features for the prediction model. The disease risk prediction model was developed with SVM classifiers, where training data were built based on Ansan/Ansung Cohort 2 Data obtained from the Korean National Institute of Health (KNIH). Here the data imbalanced problem was occurred in training data, which can be often observed in reality. Thus, to handle this problem, we regenerated the training data by using the SMOTE approach and used them for disease risk prediction modeling. For model evaluation, the proposed method was employed to predict the risk of Type-2 diabetes disease. The experiment results showed that our SVM classifiers based on selective environmental factors could produce very comparable results to the prediction model with genetic factors in forecasting the risk of specific disease.
AB - Analyzing the effects of various environmental factors on human diseases is one of the important issues in recent bioinformatics studies. In this paper we investigate several environmental factors regarding Type-2 diabetes and select some of them for develop an analytical model of disease risk prediction. For the selection of significant factors, we first preprocessed all the environmental factors into categorical values and then calculated the max/min odds ratios of all the categorized environmental factors. After that, we chose the top-n ranked factors as input features for the prediction model. The disease risk prediction model was developed with SVM classifiers, where training data were built based on Ansan/Ansung Cohort 2 Data obtained from the Korean National Institute of Health (KNIH). Here the data imbalanced problem was occurred in training data, which can be often observed in reality. Thus, to handle this problem, we regenerated the training data by using the SMOTE approach and used them for disease risk prediction modeling. For model evaluation, the proposed method was employed to predict the risk of Type-2 diabetes disease. The experiment results showed that our SVM classifiers based on selective environmental factors could produce very comparable results to the prediction model with genetic factors in forecasting the risk of specific disease.
KW - disease risk prediction
KW - Environmental-wide association study
KW - SVM classifiers
UR - http://www.scopus.com/inward/record.url?scp=84907322676&partnerID=8YFLogxK
U2 - 10.1109/ISCE.2014.6884338
DO - 10.1109/ISCE.2014.6884338
M3 - Conference contribution
AN - SCOPUS:84907322676
SN - 9781479945924
T3 - Proceedings of the International Symposium on Consumer Electronics, ISCE
BT - ISCE 2014 - 18th IEEE International Symposium on Consumer Electronics
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE International Symposium on Consumer Electronics, ISCE 2014
Y2 - 22 June 2014 through 25 June 2014
ER -