A Comparative Study of Data Clustering Algorithms

Geet Singhal; Shipra Panwar; Kanika Jain; Devender Banga

Research Article

A Comparative Study of Data Clustering Algorithms

by Geet Singhal, Shipra Panwar, Kanika Jain, Devender Banga

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 83 - Issue 15

Published: December 2013

Authors: Geet Singhal, Shipra Panwar, Kanika Jain, Devender Banga

10.5120/14528-2927

PDF

Geet Singhal, Shipra Panwar, Kanika Jain, Devender Banga . A Comparative Study of Data Clustering Algorithms. International Journal of Computer Applications. 83, 15 (December 2013), 41-46. DOI=10.5120/14528-2927

                        @article{ 10.5120/14528-2927,
                        author  = { Geet Singhal,Shipra Panwar,Kanika Jain,Devender Banga },
                        title   = { A Comparative Study of Data Clustering Algorithms },
                        journal = { International Journal of Computer Applications },
                        year    = { 2013 },
                        volume  = { 83 },
                        number  = { 15 },
                        pages   = { 41-46 },
                        doi     = { 10.5120/14528-2927 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2013
                        %A Geet Singhal
                        %A Shipra Panwar
                        %A Kanika Jain
                        %A Devender Banga
                        %T A Comparative Study of Data Clustering Algorithms%T 
                        %J International Journal of Computer Applications
                        %V 83
                        %N 15
                        %P 41-46
                        %R 10.5120/14528-2927
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Data clustering is a process of partitioning data points into meaningful clusters such that a cluster holds similar data and different clusters hold dissimilar data. It is an unsupervised approach to classify data into different patterns. In general, the clustering algorithms can be classified into the following two categories: firstly, hard clustering, where a data object can belong to a single and distinct cluster and secondly, soft clustering, where a data object can belong to different clusters. In this report we have made a comparative study of three major data clustering algorithms highlighting their merits and demerits. These algorithms are: k-means, fuzzy c-means and K-NN clustering algorithm. Choosing an appropriate clustering algorithm for grouping the data takes various factors into account for illustration one is the size of data to be partitioned.

References

Joseph P. Bigus. "Data Mining With Neural Networks",Mcgraw-Hill (Tx), 1996
Paulraj Pooniah. "Data Warehousing Fundamentals", Wiley; 2 edition (May 24, 2010).
Jain, A. K. and Dubes, R. C. (1988) Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.
Shiv Pratap, Singh Kushwah, KeshavRawat and Pradeep Gupta. Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining.
Arpit Gupta, Ankit Gupta and Amit Mishra. Research Paper On Cluster Techniques Of Data Variations, IJATER, 2011 Volume 1.
Yi Liu, Rong Jin, and Anil K. Jain. "BoostCluster: Boosting Clustering by Pairwise Constraints", KDD 2007, USA.
Anil K. Jain, Alexander Topchy, Martin H. C. Law,and Joachim M. Buhmann. "Landscape of Clustering Algorithms. " ICPR 2004, Vol. 1
Raymond T. Ng and JiaweiHany. "Efficient and Effective Clustering Methods for Spatial Data Mining". 20th VLDB Conference, 1994
Shiv Pratap Singh Kushwah, KeshavRawat, Pradeep Gupta. "Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining", IJITEE 2012, Volume 1, Issue 3.
R. Suganya, R. Shanthi . "Fuzzy C- Means Algorithm- A Review" IJSRP, Volume 2, Issue 11, November 2012 Edition.
P´adraig Cunningham1 and Sarah Jane Delany. "k-Nearest Neighbour Classifiers Technical Report", UCD-CSI-2007-4March 27, 2007
A. K. Jain, M. N. Murty and P. J. Flynn. "Data Clustering: A Review" ACM Computing Surveys, Vol. 31, No. 3, September

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

k-means algorithm c-means algorithm k-nn algorithm Euclidian distance Hard clustering Soft clustering.