Optimizing k-means for Scalability

Akansha Agrawal; Shreya Sharma

Research Article

Optimizing k-means for Scalability

by Akansha Agrawal, Shreya Sharma

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 120 - Issue 17

Published: June 2015

Authors: Akansha Agrawal, Shreya Sharma

10.5120/21320-4337

PDF

Akansha Agrawal, Shreya Sharma . Optimizing k-means for Scalability. International Journal of Computer Applications. 120, 17 (June 2015), 20-24. DOI=10.5120/21320-4337

                        @article{ 10.5120/21320-4337,
                        author  = { Akansha Agrawal,Shreya Sharma },
                        title   = { Optimizing k-means for Scalability },
                        journal = { International Journal of Computer Applications },
                        year    = { 2015 },
                        volume  = { 120 },
                        number  = { 17 },
                        pages   = { 20-24 },
                        doi     = { 10.5120/21320-4337 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2015
                        %A Akansha Agrawal
                        %A Shreya Sharma
                        %T Optimizing k-means for Scalability%T 
                        %J International Journal of Computer Applications
                        %V 120
                        %N 17
                        %P 20-24
                        %R 10.5120/21320-4337
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Proposed decades ago, k-means is still the most popular algorithm for clustering. Despite the drawbacks of k-means, its advantages make it most attractive. Several researches have been conducted to alleviate the problems of k-means. We suggest here some simple modifications to optimize k-means for scalability without much sacrifice in the precision. Current shift in emphasis of data mining towards Big Data requires fast algorithms that can scale well. We propose an idea how time-tested techniques can be adapted to changing needs. The implementation results demonstrate the impact simple modifications can bring

References

J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967.
A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31:651-666, 2010.
X. Wu et al. Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1):1-37, 2008.
Lozano, J. A. , Pena, J. M. , Larranaga, P. , 1999. An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters 20, 1027–1040.
E. W. Forgy (1965). "Cluster analysis of multivariate data: efficiency versus interpretability of classifications". Biometrics 21: 768–769.
Kaufman, L. , Rousseeuw, P. J. , 1990. Finding Groups in Data. An Introduction to Cluster Analysis. Wiley, Canada.
Erisoglu, M. , Calis, N. , Sakallioglu, S. , 2011. A new algorithm for initial cluster centers in k-means algorithm. Pattern Recognition Letters 32, 1701–1705.
C Liu, T Hu, Y Ge and H Xiong, "Which Distance Metric is Right: An Evolutionary K-Means View", Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim, California, USA, April 26-28, 2012.
Igor Melnykov, Volodymyr Melnykov. "On K-means algorithm with the use of Mahalanobis distances", Statistics and Probability Letters 84 (2014) 88–95. http://dx. doi. org/10. 1016/j. spl. 2013. 09. 026
GrigoriosTzortzis, AristidisLikas. "The MinMax k-Means clustering algorithm", Pattern Recognition 47(2014)2505–2516. http://dx. doi. org/10. 1016/j. patcog. 2014. 01. 015
Sadhana Tiwari and Tanu Solanki, "An Optimized Approach for k-means Clustering", International Journal of Computer Applications (0975 – 8887) 9th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness (QShine-2013)
A Singh, A Yadav and A Rana, "K-means with Three different Distance Metrics", International Journal of Computer Applications (0975 – 8887) Volume 67– No. 10, April 2013.
M Ramakrishnan and DT Jayaraj, "Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets", International Journal of Computer Applications (0975 – 8887) Volume 89 – No. 7, March 2014.
E. H. Ruspini (1970) Numerical methods for fuzzy clustering. Inform. Sci. 2, 319–350.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Data mining Big Data k-means