Research Article

Optimizing k-means for Scalability

by  Akansha Agrawal, Shreya Sharma
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 120 - Issue 17
Published: June 2015
Authors: Akansha Agrawal, Shreya Sharma
10.5120/21320-4337
PDF

Akansha Agrawal, Shreya Sharma . Optimizing k-means for Scalability. International Journal of Computer Applications. 120, 17 (June 2015), 20-24. DOI=10.5120/21320-4337

                        @article{ 10.5120/21320-4337,
                        author  = { Akansha Agrawal,Shreya Sharma },
                        title   = { Optimizing k-means for Scalability },
                        journal = { International Journal of Computer Applications },
                        year    = { 2015 },
                        volume  = { 120 },
                        number  = { 17 },
                        pages   = { 20-24 },
                        doi     = { 10.5120/21320-4337 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2015
                        %A Akansha Agrawal
                        %A Shreya Sharma
                        %T Optimizing k-means for Scalability%T 
                        %J International Journal of Computer Applications
                        %V 120
                        %N 17
                        %P 20-24
                        %R 10.5120/21320-4337
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Proposed decades ago, k-means is still the most popular algorithm for clustering. Despite the drawbacks of k-means, its advantages make it most attractive. Several researches have been conducted to alleviate the problems of k-means. We suggest here some simple modifications to optimize k-means for scalability without much sacrifice in the precision. Current shift in emphasis of data mining towards Big Data requires fast algorithms that can scale well. We propose an idea how time-tested techniques can be adapted to changing needs. The implementation results demonstrate the impact simple modifications can bring

References
  • J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967.
  • A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31:651-666, 2010.
  • X. Wu et al. Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1):1-37, 2008.
  • Lozano, J. A. , Pena, J. M. , Larranaga, P. , 1999. An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters 20, 1027–1040.
  • E. W. Forgy (1965). "Cluster analysis of multivariate data: efficiency versus interpretability of classifications". Biometrics 21: 768–769.
  • Kaufman, L. , Rousseeuw, P. J. , 1990. Finding Groups in Data. An Introduction to Cluster Analysis. Wiley, Canada.
  • Erisoglu, M. , Calis, N. , Sakallioglu, S. , 2011. A new algorithm for initial cluster centers in k-means algorithm. Pattern Recognition Letters 32, 1701–1705.
  • C Liu, T Hu, Y Ge and H Xiong, "Which Distance Metric is Right: An Evolutionary K-Means View", Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim, California, USA, April 26-28, 2012.
  • Igor Melnykov, Volodymyr Melnykov. "On K-means algorithm with the use of Mahalanobis distances", Statistics and Probability Letters 84 (2014) 88–95. http://dx. doi. org/10. 1016/j. spl. 2013. 09. 026
  • GrigoriosTzortzis, AristidisLikas. "The MinMax k-Means clustering algorithm", Pattern Recognition 47(2014)2505–2516. http://dx. doi. org/10. 1016/j. patcog. 2014. 01. 015
  • Sadhana Tiwari and Tanu Solanki, "An Optimized Approach for k-means Clustering", International Journal of Computer Applications (0975 – 8887) 9th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness (QShine-2013)
  • A Singh, A Yadav and A Rana, "K-means with Three different Distance Metrics", International Journal of Computer Applications (0975 – 8887) Volume 67– No. 10, April 2013.
  • M Ramakrishnan and DT Jayaraj, "Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets", International Journal of Computer Applications (0975 – 8887) Volume 89 – No. 7, March 2014.
  • E. H. Ruspini (1970) Numerical methods for fuzzy clustering. Inform. Sci. 2, 319–350.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Data mining Big Data k-means

Powered by PhDFocusTM