Cascaded Modeling for PIMA Indian Diabetes Data

M.S. Barale; D.T. Shirke

Research Article

Cascaded Modeling for PIMA Indian Diabetes Data

by M.S. Barale, D.T. Shirke

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 139 - Issue 11

Published: April 2016

Authors: M.S. Barale, D.T. Shirke

10.5120/ijca2016909426

PDF

M.S. Barale, D.T. Shirke . Cascaded Modeling for PIMA Indian Diabetes Data. International Journal of Computer Applications. 139, 11 (April 2016), 1-4. DOI=10.5120/ijca2016909426

                        @article{ 10.5120/ijca2016909426,
                        author  = { M.S. Barale,D.T. Shirke },
                        title   = { Cascaded Modeling for PIMA Indian Diabetes Data },
                        journal = { International Journal of Computer Applications },
                        year    = { 2016 },
                        volume  = { 139 },
                        number  = { 11 },
                        pages   = { 1-4 },
                        doi     = { 10.5120/ijca2016909426 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2016
                        %A M.S. Barale
                        %A D.T. Shirke
                        %T Cascaded Modeling for PIMA Indian Diabetes Data%T 
                        %J International Journal of Computer Applications
                        %V 139
                        %N 11
                        %P 1-4
                        %R 10.5120/ijca2016909426
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper develops the cascaded models for classification of PIMA Indian diabetes database. The k-nearest neighbour method is used to impute the missing data and the processed data is used for further classification. This is done in two steps, in first step k-means clustering algorithm is used for extracting hidden patterns in data set then in second step the classification is done by using suitable classifier. k-means algorithm combined with artificial neural network classifier and k-means algorithm combined with logistic regression classifier achieve classification accuracy above 98%.

References

Alan Agresti Department of Statistics University of Florida Gainesville, Florida, An Introduction to Categorical Data Analysis 2nd Edition, (2007).
A. G. Karegowda, M. A. Jayaram, Integrating Decision Tree and ANN for Categorization of Diabetics Data, International Conference on Computer Aided Engineering, December 13– 15, IIT Madras, Chennai, India (2007).
A. G. Karegowda and M.A. Jayaram, Cascading GA & CFS for Feature Subset Selection in Medical Data Mining , International Conference on IEEE International Advance Computing Conference (IACC?09), Thapar University, Patiala, Punjab India (Mar 2009).
A. G. Karegowda, Punya V., M.A. Jayaram and A.S. Manjunath, Cascading K-means Clustering and K-Nearest Neighbor Classifier for Categorization of Diabetic Patients, International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-3, (Feb 2012).
A. G. Karegowda, Punya V., M.A. Jayaram and A.S. Manjunath, Rule based Classification for Diabetic Patients using Cascaded K-Means and Decision Tree C4.5, International Journal of Computer Applications ISSN: 0975 – 8887, Volume 45, (May 2012).
B. M. Patil , R.C. Joshi, Durga Toshniwal, Hybrid prediction model for Type-2 diabetic patients, Expert Systems with Applications, Volume 37 ISS: 8102–8108, (2010).
Gustavo E. A. P. A. Batista and Maria Carolina Monard, University of Sao Paulo, A Study of k- Nearest Neighbour as an Imputation Method.
J. Han, and M. Kamber, Data Mining: Concepts and Techniques, San Francisco, Morgan Kauffmann Publishers, 3rd edition, (2012).
Kayaer, K., & Yildirim, T., Medical diagnosis on pima Indian diabetes using general regression neural networks, artificial neural networks and neural information processing (pp. 181–184), Istanbul, Turkey, (2003).
Kemal Polat, Salih Gunes and Ahmet Arslan, A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine, Expert Systems with Applications, Volume 34 ISS: 482–487, (Jan 2008).
Marvin L. Brown and John F. Kros, Data Mining and the Impact of Missing Data, Industrial Management & Data Systems, Volume 103, ISS: 611–621, (2003).

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Missing data Clustering Classification