Research Article

Cascaded Modeling for PIMA Indian Diabetes Data

by  M.S. Barale, D.T. Shirke
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 139 - Issue 11
Published: April 2016
Authors: M.S. Barale, D.T. Shirke
10.5120/ijca2016909426
PDF

M.S. Barale, D.T. Shirke . Cascaded Modeling for PIMA Indian Diabetes Data. International Journal of Computer Applications. 139, 11 (April 2016), 1-4. DOI=10.5120/ijca2016909426

                        @article{ 10.5120/ijca2016909426,
                        author  = { M.S. Barale,D.T. Shirke },
                        title   = { Cascaded Modeling for PIMA Indian Diabetes Data },
                        journal = { International Journal of Computer Applications },
                        year    = { 2016 },
                        volume  = { 139 },
                        number  = { 11 },
                        pages   = { 1-4 },
                        doi     = { 10.5120/ijca2016909426 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2016
                        %A M.S. Barale
                        %A D.T. Shirke
                        %T Cascaded Modeling for PIMA Indian Diabetes Data%T 
                        %J International Journal of Computer Applications
                        %V 139
                        %N 11
                        %P 1-4
                        %R 10.5120/ijca2016909426
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper develops the cascaded models for classification of PIMA Indian diabetes database. The k-nearest neighbour method is used to impute the missing data and the processed data is used for further classification. This is done in two steps, in first step k-means clustering algorithm is used for extracting hidden patterns in data set then in second step the classification is done by using suitable classifier. k-means algorithm combined with artificial neural network classifier and k-means algorithm combined with logistic regression classifier achieve classification accuracy above 98%.

References
  • Alan Agresti Department of Statistics University of Florida Gainesville, Florida, An Introduction to Categorical Data Analysis 2nd Edition, (2007).
  • A. G. Karegowda, M. A. Jayaram, Integrating Decision Tree and ANN for Categorization of Diabetics Data, International Conference on Computer Aided Engineering, December 13– 15, IIT Madras, Chennai, India (2007).
  • A. G. Karegowda and M.A. Jayaram, Cascading GA & CFS for Feature Subset Selection in Medical Data Mining , International Conference on IEEE International Advance Computing Conference (IACC?09), Thapar University, Patiala, Punjab India (Mar 2009).
  • A. G. Karegowda, Punya V., M.A. Jayaram and A.S. Manjunath, Cascading K-means Clustering and K-Nearest Neighbor Classifier for Categorization of Diabetic Patients, International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-3, (Feb 2012).
  • A. G. Karegowda, Punya V., M.A. Jayaram and A.S. Manjunath, Rule based Classification for Diabetic Patients using Cascaded K-Means and Decision Tree C4.5, International Journal of Computer Applications ISSN: 0975 – 8887, Volume 45, (May 2012).
  • B. M. Patil , R.C. Joshi, Durga Toshniwal, Hybrid prediction model for Type-2 diabetic patients, Expert Systems with Applications, Volume 37 ISS: 8102–8108, (2010).
  • Gustavo E. A. P. A. Batista and Maria Carolina Monard, University of Sao Paulo, A Study of k- Nearest Neighbour as an Imputation Method.
  • J. Han, and M. Kamber, Data Mining: Concepts and Techniques, San Francisco, Morgan Kauffmann Publishers, 3rd edition, (2012).
  • Kayaer, K., & Yildirim, T., Medical diagnosis on pima Indian diabetes using general regression neural networks, artificial neural networks and neural information processing (pp. 181–184), Istanbul, Turkey, (2003).
  • Kemal Polat, Salih Gunes and Ahmet Arslan, A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine, Expert Systems with Applications, Volume 34 ISS: 482–487, (Jan 2008).
  • Marvin L. Brown and John F. Kros, Data Mining and the Impact of Missing Data, Industrial Management & Data Systems, Volume 103, ISS: 611–621, (2003).
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Missing data Clustering Classification

Powered by PhDFocusTM