Research Article

An Alternative Technique of Selecting the Initial Cluster Centers in the k-means Algorithm for Better Clustering

by  Sisir Kumar Rajbongshi, Anjana Kakoti Mahanta
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 67 - Issue 7
Published: April 2013
Authors: Sisir Kumar Rajbongshi, Anjana Kakoti Mahanta
10.5120/11409-6736
PDF

Sisir Kumar Rajbongshi, Anjana Kakoti Mahanta . An Alternative Technique of Selecting the Initial Cluster Centers in the k-means Algorithm for Better Clustering. International Journal of Computer Applications. 67, 7 (April 2013), 28-31. DOI=10.5120/11409-6736

                        @article{ 10.5120/11409-6736,
                        author  = { Sisir Kumar Rajbongshi,Anjana Kakoti Mahanta },
                        title   = { An Alternative Technique of Selecting the Initial Cluster Centers in the k-means Algorithm for Better Clustering },
                        journal = { International Journal of Computer Applications },
                        year    = { 2013 },
                        volume  = { 67 },
                        number  = { 7 },
                        pages   = { 28-31 },
                        doi     = { 10.5120/11409-6736 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2013
                        %A Sisir Kumar Rajbongshi
                        %A Anjana Kakoti Mahanta
                        %T An Alternative Technique of Selecting the Initial Cluster Centers in the k-means Algorithm for Better Clustering%T 
                        %J International Journal of Computer Applications
                        %V 67
                        %N 7
                        %P 28-31
                        %R 10.5120/11409-6736
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Although k-means works well in many cases it offers no accuracy guarantee and it has no idea to select ideal cluster representatives. This article presents a technique in which the initial cluster representatives in the standard k-means algorithm are chosen intelligently. Comparison of the quality of the clusters produced by the standard k-means algorithm, k-means using Furthest-First, and k-means using the proposed initialization technique have investigated. Experiment result shows that the quality of the clusters improves with the proposed algorithm in most of the cases.

References
  • Pujari A. K. Clustering Techniques. Data mining techniques, chapter 5, University Press, pp. 114-130, 2008.
  • Tan P. , Steinbach M. and Kumar V. Introduction to Data Mining, Cluster Analysis: Basic Concepts and Algorithms, Chapter 8, Pearson Education, pp. 487-559, 2009.
  • J. MacQueen. Some methods for Classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistic and Probability, Volume 1, pp. 281-297, 1967.
  • Eklan C. Clustering with k-means: faster, smarter and cheaper, University of California, San Diego. , April 24, 2004.
  • Goswami A. , Jin R. , Agrawal G. , Fast and Exact Out-of-Core K-Means Clustering, Department of Computer Science and Engineering Ohio State University, 2004.
  • Arthur D. , Vassilvitskii S. : "k-means++: The advantages of Careful Seeding" 2007 Symposium on Discrete Algorithms (SODA).
  • Domings P. and Hulten G. A general method for scaling up machine learning algorithms and its application to clustering. In proceedings of the Eighteenth International Conference on Machine learning, 2001.
  • Shuttle Dataset Available: http://mlr. cs. umass. edu/ml/datasets/stalog+(shuttle)
  • Synthetic Control Chart Time Series Dataset Available: http://archive. ics. uci. edu/ml/datasets/synthetic+control+chart+time+series
  • Wine Recognition Datasets Available: http://mlr. cs. umass. edu. edu/ml/datasets/wine
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Cluster representative cluster quality Furthest-First Technique centroid

Powered by PhDFocusTM