Research Article

Performance Comparison of Hard and Soft Approaches for Document Clustering

by  Vibekananda Dutta, Krishna Kumar Sharma, Deepti Gahalot
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 41 - Issue 7
Published: March 2012
Authors: Vibekananda Dutta, Krishna Kumar Sharma, Deepti Gahalot
10.5120/5557-7632
PDF

Vibekananda Dutta, Krishna Kumar Sharma, Deepti Gahalot . Performance Comparison of Hard and Soft Approaches for Document Clustering. International Journal of Computer Applications. 41, 7 (March 2012), 44-48. DOI=10.5120/5557-7632

                        @article{ 10.5120/5557-7632,
                        author  = { Vibekananda Dutta,Krishna Kumar Sharma,Deepti Gahalot },
                        title   = { Performance Comparison of Hard and Soft Approaches for Document Clustering },
                        journal = { International Journal of Computer Applications },
                        year    = { 2012 },
                        volume  = { 41 },
                        number  = { 7 },
                        pages   = { 44-48 },
                        doi     = { 10.5120/5557-7632 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2012
                        %A Vibekananda Dutta
                        %A Krishna Kumar Sharma
                        %A Deepti Gahalot
                        %T Performance Comparison of Hard and Soft Approaches for Document Clustering%T 
                        %J International Journal of Computer Applications
                        %V 41
                        %N 7
                        %P 44-48
                        %R 10.5120/5557-7632
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

There is a tremendous spread in the amount of information on the largest shared information source like search engine. Fast and standards quality document clustering algorithms play an important role in helping users effectively towards vertical search engine, World Wide Web, summarizing & organizing information. Recent surveys have shown that partitional clustering algorithms are more suitable for clustering large datasets like World Wide Web. However the K-means algorithm is the most commonly used in partitional clustering algorithm because it can easily be implemented and most efficient interms of execution in time. In this paper we represent a short overview of method for soft approaches of an optimal fuzzy document clustering algorithm as compare to the hard approaches. In the experiment we conducted, we applied the Hard and soft approaches like K-means and Fuzzy c-means on different text document datasets. The number of document in the datasets ranges from 1500 to 2600 and the number of terms ranges from 6000 to over 7500 in both hard and soft approaches. The results illustrate that the soft approaches can generated slightly better result than the hard approaches.

References
  • Dunn, J. , C. , A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters, Journal of Cybernetics 3, pp. 32-57, 1973
  • Bezdek, J. , C. , Pattern Recognition with Fuzzy Objective Function Algoritms, Plenum Press, New York, 1988
  • L. Yanjun, "Text Clustering with Feature election byUsing Statistical Data," IEEE Transactions on Knowledgeand Data Engineering, vol. 20, pp. 641-652, 2007.
  • Valente de Oliveira, J. , Pedrycz, W. , Advances in Fuzzy Clustering and its Applications, John Wiley & Sons, pp 3-30, 2007.
  • Anderberg, M. R. , 1973. Cluster Analysis for Applications. Academic Press, Inc. , New York, NY.
  • Berkhin, P. , 2002. Survey of clustering data mining techniques. Accrue Software Research Paper.
  • Cios K. , Pedrycs W. , Swiniarski R. , 1998. Data Mining – Methods for Knowledge Discovery, Kluwer Academic Publishers.
  • Everitt, B. , 1980. Cluster Analysis. 2nd Edition. Halsted Press, New York.
  • Jain A. K. , Murty M. N. , and Flynn P. J. , 1999. Data Clustering: A Review, ACM Computing Survey, Vol. 31, No. 3, pp. 264-323.
  • Hartigan, J. A. 1975. Clustering Algorithms. John Wiley and Sons, Inc. , New York, NY.
  • Salton G. and Buckley C. , 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24 (5): pp. 513-523.
  • Selim, S. Z. And Ismail, M. A. 1984. K-means type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6, 81–87.
  • Steinbach M. , Karypis G. , Kumar V. , 2000. A Comparison of Document Clustering Techniques. TextMining Workshop, KDD
  • Zhao Y. and Karypis G. , 2004. Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering, Machine Learning, 55 (3): pp. 311-331
  • Anupam Joshi and Raghu Krishnapuram , " Robust Fuzzy Clustering Methods to Support Web Mining", Proceedings of the Workshop on Data Mining and Knowledge Discovery , SOGMOD ,1998
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Document Clustering Hard And Soft Approaches Text Datasets Cluster Centriod And Vector Space Model

Powered by PhDFocusTM