Research Article

A Comparative Study of Data Clustering Algorithms

by  Geet Singhal, Shipra Panwar, Kanika Jain, Devender Banga
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 83 - Issue 15
Published: December 2013
Authors: Geet Singhal, Shipra Panwar, Kanika Jain, Devender Banga
10.5120/14528-2927
PDF

Geet Singhal, Shipra Panwar, Kanika Jain, Devender Banga . A Comparative Study of Data Clustering Algorithms. International Journal of Computer Applications. 83, 15 (December 2013), 41-46. DOI=10.5120/14528-2927

                        @article{ 10.5120/14528-2927,
                        author  = { Geet Singhal,Shipra Panwar,Kanika Jain,Devender Banga },
                        title   = { A Comparative Study of Data Clustering Algorithms },
                        journal = { International Journal of Computer Applications },
                        year    = { 2013 },
                        volume  = { 83 },
                        number  = { 15 },
                        pages   = { 41-46 },
                        doi     = { 10.5120/14528-2927 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2013
                        %A Geet Singhal
                        %A Shipra Panwar
                        %A Kanika Jain
                        %A Devender Banga
                        %T A Comparative Study of Data Clustering Algorithms%T 
                        %J International Journal of Computer Applications
                        %V 83
                        %N 15
                        %P 41-46
                        %R 10.5120/14528-2927
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Data clustering is a process of partitioning data points into meaningful clusters such that a cluster holds similar data and different clusters hold dissimilar data. It is an unsupervised approach to classify data into different patterns. In general, the clustering algorithms can be classified into the following two categories: firstly, hard clustering, where a data object can belong to a single and distinct cluster and secondly, soft clustering, where a data object can belong to different clusters. In this report we have made a comparative study of three major data clustering algorithms highlighting their merits and demerits. These algorithms are: k-means, fuzzy c-means and K-NN clustering algorithm. Choosing an appropriate clustering algorithm for grouping the data takes various factors into account for illustration one is the size of data to be partitioned.

References
  • Joseph P. Bigus. "Data Mining With Neural Networks",Mcgraw-Hill (Tx), 1996
  • Paulraj Pooniah. "Data Warehousing Fundamentals", Wiley; 2 edition (May 24, 2010).
  • Jain, A. K. and Dubes, R. C. (1988) Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.
  • Shiv Pratap, Singh Kushwah, KeshavRawat and Pradeep Gupta. Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining.
  • Arpit Gupta, Ankit Gupta and Amit Mishra. Research Paper On Cluster Techniques Of Data Variations, IJATER, 2011 Volume 1.
  • Yi Liu, Rong Jin, and Anil K. Jain. "BoostCluster: Boosting Clustering by Pairwise Constraints", KDD 2007, USA.
  • Anil K. Jain, Alexander Topchy, Martin H. C. Law,and Joachim M. Buhmann. "Landscape of Clustering Algorithms. " ICPR 2004, Vol. 1
  • Raymond T. Ng and JiaweiHany. "Efficient and Effective Clustering Methods for Spatial Data Mining". 20th VLDB Conference, 1994
  • Shiv Pratap Singh Kushwah, KeshavRawat, Pradeep Gupta. "Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining", IJITEE 2012, Volume 1, Issue 3.
  • R. Suganya, R. Shanthi . "Fuzzy C- Means Algorithm- A Review" IJSRP, Volume 2, Issue 11, November 2012 Edition.
  • P´adraig Cunningham1 and Sarah Jane Delany. "k-Nearest Neighbour Classifiers Technical Report", UCD-CSI-2007-4March 27, 2007
  • A. K. Jain, M. N. Murty and P. J. Flynn. "Data Clustering: A Review" ACM Computing Surveys, Vol. 31, No. 3, September
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

k-means algorithm c-means algorithm k-nn algorithm Euclidian distance Hard clustering Soft clustering.

Powered by PhDFocusTM