Initializing K-Means Clustering Algorithm using Statistical Information

Mohammad F. Eltibi; Wesam M. Ashour

Research Article

Initializing K-Means Clustering Algorithm using Statistical Information

by Mohammad F. Eltibi, Wesam M. Ashour

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 29 - Issue 7

Published: September 2011

Authors: Mohammad F. Eltibi, Wesam M. Ashour

10.5120/3573-4930

PDF

Mohammad F. Eltibi, Wesam M. Ashour . Initializing K-Means Clustering Algorithm using Statistical Information. International Journal of Computer Applications. 29, 7 (September 2011), 51-55. DOI=10.5120/3573-4930

                        @article{ 10.5120/3573-4930,
                        author  = { Mohammad F. Eltibi,Wesam M. Ashour },
                        title   = { Initializing K-Means Clustering Algorithm using Statistical Information },
                        journal = { International Journal of Computer Applications },
                        year    = { 2011 },
                        volume  = { 29 },
                        number  = { 7 },
                        pages   = { 51-55 },
                        doi     = { 10.5120/3573-4930 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2011
                        %A Mohammad F. Eltibi
                        %A Wesam M. Ashour
                        %T Initializing K-Means Clustering Algorithm using Statistical Information%T 
                        %J International Journal of Computer Applications
                        %V 29
                        %N 7
                        %P 51-55
                        %R 10.5120/3573-4930
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

K-means clustering algorithm is one of the best known algorithms used in clustering; nevertheless it has many disadvantages as it may converge to a local optimum, depending on its random initialization of prototypes. We will propose an enhancement to the initialization process of k-means, which depends on using statistical information from the data set to initialize the prototypes. We show that our algorithm gives valid clusters, and that it decreases error and time.

References

G. Gan, C. Ma, J Wu. "Data Clustering Theory, Algorithms, and Applications". American Statistical Association Alexandria, Virginia, 2007.
P. Tan, M. Steinbach, V. Kumar. “Introduction to Data Mining”. Addison-Wesley , 2006.
D. Fisher. “Knowledge acquisition via incremental conceptual clustering”. Machine Learning, 1987, pp. 39–172.
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy. "Advances in Knowledge Discovery and Data Mining". AAAI Press, 1996.
A. Gersho, R.M. Gray. "Vector Quantization and Signal Compression". KAP, 1992.
P.S. Bradley, O.L. Mangasarian, W.N. Street. "Clustering via concave minimization". Advances in Neural Information Processing System, MIT Press, vol. 9, 1997, pp. 368–374
J. Aguilar. “Resolution of the Clustering Problem using, Genetic Algorithms”. International Journal of computers, vol. 1, 2007.
R. Vaarandi, “A Data Clustering Algorithm for Mining Patterns from Event Logs”, Proceedings of the 2003 IEEE Workshop on IP Operations and Management. IEEE. 2003.
Q.J. Mac. "Some methods for classification and analysis of multivariate observations". In Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1967, pp. 281-297.
R. T Ng, J. Han. “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proceedings of 20th International Conference on Very Large Databases. Santiago de Chile, 1994, pp. 144 – 155.
E. Martin, H. Kriegel, J. Sander, X. Xu. "A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proceedings of second International Conference on Knowledge Discovery and Data Mining, Kluwer Academic Publishers, 1996, pp. 169- 194.
M. Ankerst, M. M. Breunig, H. Kriegel, J. Sander. “OPTICS: Ordering Points to Identify the Clustering Structure”. Proceedings of ACM SIGMOD. Pergamon Press, 1999, pp. 5761 -5767.
A. Hinneburg, H. Gabriel. “An Efficient Approach to Clustering in Large Multimedia Databases with Noise”, Proceedings of Knowledge Discovery and Data Mining. AAAI Press, 1998, pp. 58 -65.
R.O. Duda, P.E. Hart. “Pattern Classification and Scene analysis”. John Wiley and Sons, NY. 1973.
K. Arai, A. R. Barakbah. “Hierarchical K-means: an algorithm for centroids initialization for K-means”. Reports of the Faculty of Science and Engineering. Saga University, vol. 36, No.1, 2007, pp. 25-31.
J. F. Lu, J. B. Tang, Z. M. Tang, J.Y. Yang. “Hierarchical initialization approach for K-Means clustering”. Pattern Recognition Letters, vol. 29, April 2008, pp. 787-795.
S. Khan, A. Ahmad. “Cluster center initialization algorithm for K-means clustering”. Pattern Recognition Letters, vol. 25, August 2004, pp. 1293-1302.
F. Caoa, J. Liang , G. Jiang . “An initialization method for the k-Means algorithm using neighborhood model”. Computers & Mathematics with Applications, vol. 58, August 2009, pp. 474-483.
R. M. Dudley. "Uniform Central Limit Theorems". Cambridge University Press, 2008.
I. Myung. "Tutorial on maximum likelihood estimation". Journal of Mathematical Psychology, vol 47, 2003.
UCI Repository [Online]. Available: http://archive.ics.uci.edu.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Clustering K-means Clustering Initial Prototypes Determination Central Limit Theory Normal Distribution Maximum Likelihood Estimator