Research Article

A Comparative Analysis of Different Categorical Data Clustering Ensemble Methods in Data Mining

by  S. Sarumathi, N. Shanthi, M. Sharmila
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 81 - Issue 4
Published: November 2013
Authors: S. Sarumathi, N. Shanthi, M. Sharmila
10.5120/14004-2050
PDF

S. Sarumathi, N. Shanthi, M. Sharmila . A Comparative Analysis of Different Categorical Data Clustering Ensemble Methods in Data Mining. International Journal of Computer Applications. 81, 4 (November 2013), 46-55. DOI=10.5120/14004-2050

                        @article{ 10.5120/14004-2050,
                        author  = { S. Sarumathi,N. Shanthi,M. Sharmila },
                        title   = { A Comparative Analysis of Different Categorical Data Clustering Ensemble Methods in Data Mining },
                        journal = { International Journal of Computer Applications },
                        year    = { 2013 },
                        volume  = { 81 },
                        number  = { 4 },
                        pages   = { 46-55 },
                        doi     = { 10.5120/14004-2050 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2013
                        %A S. Sarumathi
                        %A N. Shanthi
                        %A M. Sharmila
                        %T A Comparative Analysis of Different Categorical Data Clustering Ensemble Methods in Data Mining%T 
                        %J International Journal of Computer Applications
                        %V 81
                        %N 4
                        %P 46-55
                        %R 10.5120/14004-2050
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Over the past decades, a prevalent amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Moreover a myriad of algorithms and methods has been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main aspire of the cluster ensemble is to combine different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of the new methods in the sphere of data mining, it is obligatory to make a critical analysis of the existing techniques and the future novelty. This paper reveals the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this theoretical and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

References
  • Sandro Vega-pons & Jose reuiz Shulcloper. "A Survey of Clustering Ensemble algorithms. " International Journal of Pattern Recognition and Artificial Intelligence Vol. 25, No. 3 337_372 , 2011.
  • Cristofor. D & Simovici. D," Finding Median Partitions Using Information Theoretical Based Genetic Algorithms. " J. Universal Computer Science, vol. 8, no. 2, pp. 153-172, 2002.
  • Fisher. D. H . " Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning," vol. 2, pp. 139-172, 1987.
  • Gibson. D, Klein. J & Raghavan. R, "Clustering Categorical Data: An Approach Based on Dynamical Systems. " Very Large Data Base Endowment Journal . vol. 8, nos. 3-4, pp. 222-236, 2000
  • Guha. S, Rastogi. R, & Shim. K,. "ROCK: A Robust Clustering Algorithm for Categorical Attributes. " Information Systems, vol. 25, no. 5, pp. 345-366, 2000
  • Zaki. M. J & Peters. M. Clicks:" Mining Subspace Clusters in Categorical Data via Kpartite Maximal Cliques". Proc. International Conference on Data Engineering (ICDE), pp. 355-356, 2005.
  • Ganti. V, Gehrke. J, & Ramakrishnan. R "CACTUS: Clustering Categorical Data Using Summaries. " Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 73-83, 1999.
  • Barbara. D, Li. Y, & Couto. J "COOLCAT: An Entropy-Based Algorithm for Categorical Clustering. " Proc. International Conference on Information and Knowledge Management pp. 582-589, 2002.
  • Yang. Y, Guan. S, & You. J. "CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data. " Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 682- 687, 2002.
  • He. Z, Xu. X, & S. Deng. Squeezer: "An Efficient Algorithm for Clustering Categorical Data. " J. Computer Science and Technology vol. 17, no. 5, pp. 611-624, 2002.
  • Andritsos. P & Tzerpos. V. "Information Theoretic Software Clustering. " IEEE Transactions on Software Engineering. , Vol. 31, no. 2, pp. 150-165, 2005.
  • Indrajit Saha, Ujjwal Maulik, & Nilanjan. "Differential Fuzzy Clustering for Categorical Data. " International Conference on Methods and Models in Computer Science, 2009.
  • Natthakan Iam-On, Tossapon Boongoen, Simon Garrett, & Chris Price. "A Link based cluster ensemble approach for categorical data clustering. " IEEE Transactions on knowledge and data engineering, Vol. 24, No. 3, 2012.
  • Sandro Vega-pons & Jose reuiz Shulcloper. "A Survey of Clustering Ensemble algorithms. "International Journal of Pattern Recognition and Artificial Intelligence Vol. 25, No. 3 (2011) 337_372.
  • Harun Pirim, Dilip Gautam, Tanmay , Bhowmik, Andy D. Perkins, Burak Ek?ioglu, & Ahmet Alkan, " Performance of an ensemble clustering algorithm on biological datasets". Mathematical and Computational Applications, Vol. 16, No. 1, pp. 87-96. 2011
  • Domeniconi. C & Al-Razgan. M, " Weighted cluster ensembles: methods and analysis. "ACM Transaction on. Knowledge Discovery Data 2(4) 1_40. 2009
  • Li Zhang*a, Weida Zhoua, Caili Wua, Jieting Huoa, Haishuang Zoua, & Licheng Jiaoa. "Center matching scheme for K-means cluster ensembles. " MIPPR Pattern Recognition and Computer Vision, edited by Mingyue Ding, Bir Bhanu, Friedrich M. Wahl, Jonathan Roberts, Proc. of SPIE Vol. 7496, 749614 SPIE. 2009
  • Weingessel, A, Dimitriadou, E. , & Hornik, K. "An ensemblemethodforclustering. "Workingpaperhttp://www. Ci. tuwien. ac. at/conferences/DSC-2003, 51. 2003
  • Hamid Parvin, Hamid Alinejad-Rokny, & Sajad Parvin. " A New Clustering Ensemble Framework. " International Journal of Learning Management Systems, J. Learn. Man. Sys. 1, No. 1, 19-25. 2013
  • Yang Lili, Yu Jian, & JIA Caiyan. "A New method for Cluster Ensembles", Programs Foundation of Ministry of Education of China. 2013.
  • Yu J. & Lin Z C. " Squared error adjacency matrix clustering. " Technical report on Dept. of Computer Science, Beijing Jiaotong University 2008.
  • Fowlkes C, Belongie S, & Chung F, et al. . " Spectral grouping using the Nyström method. " IEEE Transactions on Geoscience and Remote Sensing (2): 214-225 2004.
  • Ng A, Jordan M, & Weiss Y. "On spectral clustering: Analysis and an algorithm[C]. " Advances in Neural Information Processing Systems (NIPS). Boston: MIT Press, 849-857. 2002
  • XU Yuanchun, JIA Jianhua. "Adaptive Spectral Clustering Ensemble Selection via Re-sampling and Population Based Incremental Learning Algorithm. " Journal of Natural Sciences, Vol. 16 No. 3, 228-236 2011
  • Al-Razgan. M, Domeniconi. R, & Barbara. D. "Random Subspace Ensembles for Clustering Categorical Data. Supervised and Unsupervised Ensemble Methods and Their Applications," pp. 31-48, Springer. 2008.
  • Jianhua Jia, Xuan Xiao, & Binxiang Liu, "Similarity-based Spectral Clustering Ensemble Selection. " 9th IEEE International Conference on Fuzzy Systems and Knowledge Discovery. 2012
  • Zhang. X. R, JiaoL. C, & Liu. F et. al. "Spectral clustering ensemble applied to SAR image segmentation. " IEEE Transactions on Geoscience and Remote Sensing, 46 (7)2126-2136 2008
  • Hongjun Wang, Hanhuai Shan & Arindam Banerjee. "Bayesian Cluster Ensembles. " Wiley Periodicals, Inc. 2011
  • Jamil Al-Shaqsi & Wenjia Wang, "A Clustering Ensemble Method for Clustering Mixed Data. " IEEE International conference 978-1-4244-8126-2/10/$26. 00. 2010
  • Al Shaqsi J. & Wang W. "A Novel Three Staged Clustering Algorithm. AIDES European Conference on Data Mining," A. P. Abraham, Ed. Ed. Algarve, Portugal, pp. 19-26 2009.
  • Ioannis T. Christou, Member IEEE " Coordination of Cluster Ensembles via Exact Methods. " IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 2. 2010
  • O. du Merle, P. Hansen, B. Jaumard, and N. Mladenovich. "An Interior Point Algorithm for Minimum Sum of Squares Clustering. " SIAM J. Scientific Computing, vol. 21, no. 4, pp. 1484-1505, Mar. 2000.
  • Topchy A, Jain AK, Punch WF "A mixture model for clustering ensembles. " In: Proceedings of SIAM international conference on data mining, SDM 04, pp 379–390 2004
  • Fred ALN, Jain AK "Combining multiple clustering using evidence accumulation. " IEEE Trans Pattern Anal Mach Intell 27(6)2005
  • Strehl A, Ghosh J "Cluster ensembles-a knowledge reuse framework for combining multiple partitions. " J Mach Learn Res 3:583–617 2003
  • Topchy A, Jain AK, Punch WF "Combining multiple weak clusterings. " In: Proceedings of 3rd IEEE international conference on data mining, pp 331–338 2003
  • Gullo F, Domeniconi C, Tagarelli A "Projective clustering ensembles. " In: Proceedings of the international conference on data mining (ICDM), pp 794–799 2009
  • Ka Ka Ng E, Wai-Chee Fu A, Chi-Wing Wong R "Projective clustering by histograms. " IEEE Trans Knowl Data Eng (TKDE) 17(3):369–383 2005
  • Yiu ML, Mamoulis N "Iterative projected clustering by subspace mining. " IEEE Trans Knowl Data Eng (TKDE) 17(2):176–189 2005
  • Achtert E, Böhm C, Kriegel H-P, Kröger P, Müller-Gorman I, Zimek A " Finding hierarchies of subspace clusters. " In: Proceedings of the European conference on principles and practice of knowledge discovery in databases (PKDD), pp 446–453 2006
  • Domeniconi C, Gunopulos D,MaS,YanB,Al-Razgan M, PapadopoulosD "Locally adaptive metrics for clustering high dimensional data. " Data Min Knowl Disc 14(1):63–972007
  • Deb K "Multi-objective optimization using evolutionary algorithms". Wiley, New York. 2001
  • Ruochen Liu, Member, IEEE, Yong Liu, Yangyang Li?Member, IEEE, "An Improved Method for Multi-Objective clustering Ensemble Algorithm. " IEEE World Congress on Computational Intelligence June, 10-15, 2012 - Brisbane, Australia 2012
  • A. Strehl, J. Ghosh, "Cluster ensembles-a knowledge reuse framework for combining multiple partitions," Journal of Machine Learning Research 3 (2002) 583–618. 2002
  • K. Faceli, A. Carvalho, M. de Souto. " Multi-objective clustering ensemble for gene expression data analysis," Neurocomputing 72(2009)2753-2774.
  • Shaohong Zhang, Hau-San Wong, "ARImp A Generalized Adjusted Rand Index for Cluster Ensembles. " International Conference on Pattern Recognition, IEEE Computer Society. 2010
  • L. Hubert and P. Arabie. " Comparing partitions. " Journal of Classification, 2:193–218, 1985.
  • Taoying Li, Yan Chen "Fuzzy Clustering Ensemble Algorithm for Partitioning Categorical Data. " International Conference on Business Intelligence and Financial Engineering IEEE Computer Society. 2009
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Cluster Ensemble methods Co-association matrix Consensus function Median partition.

Powered by PhDFocusTM