An Alternative Technique of Selecting the Initial Cluster Centers in the k-means Algorithm for Better Clustering

Sisir Kumar Rajbongshi; Anjana Kakoti Mahanta

Research Article

An Alternative Technique of Selecting the Initial Cluster Centers in the k-means Algorithm for Better Clustering

by Sisir Kumar Rajbongshi, Anjana Kakoti Mahanta

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 67 - Issue 7

Published: April 2013

Authors: Sisir Kumar Rajbongshi, Anjana Kakoti Mahanta

10.5120/11409-6736

PDF

Sisir Kumar Rajbongshi, Anjana Kakoti Mahanta . An Alternative Technique of Selecting the Initial Cluster Centers in the k-means Algorithm for Better Clustering. International Journal of Computer Applications. 67, 7 (April 2013), 28-31. DOI=10.5120/11409-6736

                        @article{ 10.5120/11409-6736,
                        author  = { Sisir Kumar Rajbongshi,Anjana Kakoti Mahanta },
                        title   = { An Alternative Technique of Selecting the Initial Cluster Centers in the k-means Algorithm for Better Clustering },
                        journal = { International Journal of Computer Applications },
                        year    = { 2013 },
                        volume  = { 67 },
                        number  = { 7 },
                        pages   = { 28-31 },
                        doi     = { 10.5120/11409-6736 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2013
                        %A Sisir Kumar Rajbongshi
                        %A Anjana Kakoti Mahanta
                        %T An Alternative Technique of Selecting the Initial Cluster Centers in the k-means Algorithm for Better Clustering%T 
                        %J International Journal of Computer Applications
                        %V 67
                        %N 7
                        %P 28-31
                        %R 10.5120/11409-6736
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Although k-means works well in many cases it offers no accuracy guarantee and it has no idea to select ideal cluster representatives. This article presents a technique in which the initial cluster representatives in the standard k-means algorithm are chosen intelligently. Comparison of the quality of the clusters produced by the standard k-means algorithm, k-means using Furthest-First, and k-means using the proposed initialization technique have investigated. Experiment result shows that the quality of the clusters improves with the proposed algorithm in most of the cases.

References

Pujari A. K. Clustering Techniques. Data mining techniques, chapter 5, University Press, pp. 114-130, 2008.
Tan P. , Steinbach M. and Kumar V. Introduction to Data Mining, Cluster Analysis: Basic Concepts and Algorithms, Chapter 8, Pearson Education, pp. 487-559, 2009.
J. MacQueen. Some methods for Classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistic and Probability, Volume 1, pp. 281-297, 1967.
Eklan C. Clustering with k-means: faster, smarter and cheaper, University of California, San Diego. , April 24, 2004.
Goswami A. , Jin R. , Agrawal G. , Fast and Exact Out-of-Core K-Means Clustering, Department of Computer Science and Engineering Ohio State University, 2004.
Arthur D. , Vassilvitskii S. : "k-means++: The advantages of Careful Seeding" 2007 Symposium on Discrete Algorithms (SODA).
Domings P. and Hulten G. A general method for scaling up machine learning algorithms and its application to clustering. In proceedings of the Eighteenth International Conference on Machine learning, 2001.
Shuttle Dataset Available: http://mlr. cs. umass. edu/ml/datasets/stalog+(shuttle)
Synthetic Control Chart Time Series Dataset Available: http://archive. ics. uci. edu/ml/datasets/synthetic+control+chart+time+series
Wine Recognition Datasets Available: http://mlr. cs. umass. edu. edu/ml/datasets/wine

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Cluster representative cluster quality Furthest-First Technique centroid