A New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set

D.Napoleon; S.Pavalakodi

Research Article

A New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set

by D.Napoleon, S.Pavalakodi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 13 - Issue 7

Published: January 2011

Authors: D.Napoleon, S.Pavalakodi

10.5120/1789-2471

PDF

D.Napoleon, S.Pavalakodi . A New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set. International Journal of Computer Applications. 13, 7 (January 2011), 41-46. DOI=10.5120/1789-2471

                        @article{ 10.5120/1789-2471,
                        author  = { D.Napoleon,S.Pavalakodi },
                        title   = { A New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set },
                        journal = { International Journal of Computer Applications },
                        year    = { 2011 },
                        volume  = { 13 },
                        number  = { 7 },
                        pages   = { 41-46 },
                        doi     = { 10.5120/1789-2471 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2011
                        %A D.Napoleon
                        %A S.Pavalakodi
                        %T A New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set%T 
                        %J International Journal of Computer Applications
                        %V 13
                        %N 7
                        %P 41-46
                        %R 10.5120/1789-2471
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Clustering is the process of finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups. Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimensionality that corresponds to the intrinsic dimensionality of the data. K-means clustering algorithm often does not work well for high dimension, hence, to improve the efficiency, apply PCA on original data set and obtain a reduced dataset containing possibly uncorrelated variables. In this paper principal component analysis and linear transformation is used for dimensionality reduction and initial centroid is computed, then it is applied to K-Means clustering algorithm.

References

Bradley, P. S., Bennett, K. P., & Demiriz, A. (2000).Constrained k-means clustering (Technical ReportMSR-TR-2000-65). Microsoft Research, Redmond, WA.
C Ding,”Principal Component Analysis and Effective K-means Clustering”
Chao Shi and Chen Lihui, 2005. Feature dimension reduction for microarray data analysis using locally linear embedding, 3rd Asia Pacific Bioinformatics Conference, pp. 211-217.
Chris Ding and Xiaofeng He, “K-Means Clustering via Principal Component Analysis”, In proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004
Davy Michael and Luz Saturnine, 2007. Dimensionality reduction for active learning with nearest neighbor classifier in text categorization problems, Sixth International Conference on Machine Learning and Applications, pp. 292-297
IEEEI.T Jolliffe, “Principal Component Analysis”, Springer, second edition.
Kiri Wagsta- Claire Cardie ,”Constrained K-means Clustering with Background Knowledge”
.Maaten L.J.P., Postma E.O. and Herik H.J. van den, 2007. Dimensionality reduction: A comparative review”, Tech. rep.University of Maastricht.
Moth’d Belal. Al-Daoud , (2005).A New Algorithm for Cluster Initialization, World Academy of Science, Engineering and Technology.
O Shamir,”Model Selection and Stability in k-means Clustering”
Rand, W. M. (1971). Objective criteria for the evaluation of clustering met hods. Journal of the AmericanStatistical Association, 66, 846-850.
RM Suresh, K Dinakaran, P Valarmathie,“Model based modified k-means clustering for microarray data”,
International Conference on Information Management and Engineering, Vol.13, pp 271-273, 2009, .Valarmathie P., Srinath M. and Dinakaran K., 2009. An increased performance of clustering high dimensional data through dimensionality reduction technique, Journal of Theoretical and Applied Information Technology, Vol. 13, pp. 271-273
Wagsta_, K., & Cardie, C. (2000). Clustering with instance-level constraints. Proceedings of the Seventeenth International Conference on Machine Learning (pp. 1103{1110). Palo Alto, CA: Morgan Kaufmann.
Wray Buntine,” K-means Clustering and PCA”, National ICT Australia
Xu R. and Wunsch D., 2005. Survey of clustering algorithms, IEEE Trans. Neural Networks, Vol. 16, No. 3, pp. 645-678.
Yan Jun, Zhang Benyu, Liu Ning, Yan Shuicheng, Cheng Qiansheng, Fan Weiguo, Yang Qiang, Xi Wensi, and Chen Zheng,2006. Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing, IEEE transactions on Knowledge and Data Engineering, Vol. 18, No. 3, pp. 320-333.
Yeung Ka Yee and Ruzzo Walter L., 2000. An empirical study on principal component analysis for clustering gene expressionData”,Tech. Report, University of Washington.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Clustering Dimensionality Reduction Principal component analysis k-means algorithm Amalgamation