Research Article

Initializing K-Means Clustering Algorithm using Statistical Information

by  Mohammad F. Eltibi, Wesam M. Ashour
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 29 - Issue 7
Published: September 2011
Authors: Mohammad F. Eltibi, Wesam M. Ashour
10.5120/3573-4930
PDF

Mohammad F. Eltibi, Wesam M. Ashour . Initializing K-Means Clustering Algorithm using Statistical Information. International Journal of Computer Applications. 29, 7 (September 2011), 51-55. DOI=10.5120/3573-4930

                        @article{ 10.5120/3573-4930,
                        author  = { Mohammad F. Eltibi,Wesam M. Ashour },
                        title   = { Initializing K-Means Clustering Algorithm using Statistical Information },
                        journal = { International Journal of Computer Applications },
                        year    = { 2011 },
                        volume  = { 29 },
                        number  = { 7 },
                        pages   = { 51-55 },
                        doi     = { 10.5120/3573-4930 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2011
                        %A Mohammad F. Eltibi
                        %A Wesam M. Ashour
                        %T Initializing K-Means Clustering Algorithm using Statistical Information%T 
                        %J International Journal of Computer Applications
                        %V 29
                        %N 7
                        %P 51-55
                        %R 10.5120/3573-4930
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

K-means clustering algorithm is one of the best known algorithms used in clustering; nevertheless it has many disadvantages as it may converge to a local optimum, depending on its random initialization of prototypes. We will propose an enhancement to the initialization process of k-means, which depends on using statistical information from the data set to initialize the prototypes. We show that our algorithm gives valid clusters, and that it decreases error and time.

References
  • G. Gan, C. Ma, J Wu. "Data Clustering Theory, Algorithms, and Applications". American Statistical Association Alexandria, Virginia, 2007.
  • P. Tan, M. Steinbach, V. Kumar. “Introduction to Data Mining”. Addison-Wesley , 2006.
  • D. Fisher. “Knowledge acquisition via incremental conceptual clustering”. Machine Learning, 1987, pp. 39–172.
  • U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy. "Advances in Knowledge Discovery and Data Mining". AAAI Press, 1996.
  • A. Gersho, R.M. Gray. "Vector Quantization and Signal Compression". KAP, 1992.
  • P.S. Bradley, O.L. Mangasarian, W.N. Street. "Clustering via concave minimization". Advances in Neural Information Processing System, MIT Press, vol. 9, 1997, pp. 368–374
  • J. Aguilar. “Resolution of the Clustering Problem using, Genetic Algorithms”. International Journal of computers, vol. 1, 2007.
  • R. Vaarandi, “A Data Clustering Algorithm for Mining Patterns from Event Logs”, Proceedings of the 2003 IEEE Workshop on IP Operations and Management. IEEE. 2003.
  • Q.J. Mac. "Some methods for classification and analysis of multivariate observations". In Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1967, pp. 281-297.
  • R. T Ng, J. Han. “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proceedings of 20th International Conference on Very Large Databases. Santiago de Chile, 1994, pp. 144 – 155.
  • E. Martin, H. Kriegel, J. Sander, X. Xu. "A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proceedings of second International Conference on Knowledge Discovery and Data Mining, Kluwer Academic Publishers, 1996, pp. 169- 194.
  • M. Ankerst, M. M. Breunig, H. Kriegel, J. Sander. “OPTICS: Ordering Points to Identify the Clustering Structure”. Proceedings of ACM SIGMOD. Pergamon Press, 1999, pp. 5761 -5767.
  • A. Hinneburg, H. Gabriel. “An Efficient Approach to Clustering in Large Multimedia Databases with Noise”, Proceedings of Knowledge Discovery and Data Mining. AAAI Press, 1998, pp. 58 -65.
  • R.O. Duda, P.E. Hart. “Pattern Classification and Scene analysis”. John Wiley and Sons, NY. 1973.
  • K. Arai, A. R. Barakbah. “Hierarchical K-means: an algorithm for centroids initialization for K-means”. Reports of the Faculty of Science and Engineering. Saga University, vol. 36, No.1, 2007, pp. 25-31.
  • J. F. Lu, J. B. Tang, Z. M. Tang, J.Y. Yang. “Hierarchical initialization approach for K-Means clustering”. Pattern Recognition Letters, vol. 29, April 2008, pp. 787-795.
  • S. Khan, A. Ahmad. “Cluster center initialization algorithm for K-means clustering”. Pattern Recognition Letters, vol. 25, August 2004, pp. 1293-1302.
  • F. Caoa, J. Liang , G. Jiang . “An initialization method for the k-Means algorithm using neighborhood model”. Computers & Mathematics with Applications, vol. 58, August 2009, pp. 474-483.
  • R. M. Dudley. "Uniform Central Limit Theorems". Cambridge University Press, 2008.
  • I. Myung. "Tutorial on maximum likelihood estimation". Journal of Mathematical Psychology, vol 47, 2003.
  • UCI Repository [Online]. Available: http://archive.ics.uci.edu.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Clustering K-means Clustering Initial Prototypes Determination Central Limit Theory Normal Distribution Maximum Likelihood Estimator

Powered by PhDFocusTM