Research Article

A Frequent Concepts Based Document Clustering Algorithm

by  Dr.Renu Dhir, Rekha Baghel
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 4 - Issue 5
Published: July 2010
Authors: Dr.Renu Dhir, Rekha Baghel
10.5120/826-1171
PDF

Dr.Renu Dhir, Rekha Baghel . A Frequent Concepts Based Document Clustering Algorithm. International Journal of Computer Applications. 4, 5 (July 2010), 6-12. DOI=10.5120/826-1171

                        @article{ 10.5120/826-1171,
                        author  = { Dr.Renu Dhir,Rekha Baghel },
                        title   = { A Frequent Concepts Based Document Clustering Algorithm },
                        journal = { International Journal of Computer Applications },
                        year    = { 2010 },
                        volume  = { 4 },
                        number  = { 5 },
                        pages   = { 6-12 },
                        doi     = { 10.5120/826-1171 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2010
                        %A Dr.Renu Dhir
                        %A Rekha Baghel
                        %T A Frequent Concepts Based Document Clustering Algorithm%T 
                        %J International Journal of Computer Applications
                        %V 4
                        %N 5
                        %P 6-12
                        %R 10.5120/826-1171
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents a novel technique of document clustering based on frequent concepts. The proposed technique, FCDC (Frequent Concepts based document clustering), a clustering algorithm works with frequent concepts rather than frequent items used in traditional text mining techniques. Many well known clustering algorithms deal with documents as bag of words and ignore the important relationships between words like synonyms. the proposed FCDC algorithm utilizes the semantic relationship between words to create concepts. It exploits the WordNet ontology in turn to create low dimensional feature vector which allows us to develop a efficient clustering algorithm. It uses a hierarchical approach to cluster text documents having common concepts. FCDC found more accurate, scalable and effective when compared with existing clustering algorithms like Bisecting K-means , UPGMA and FIHC.

References
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Document clustering Clustering algorithm Frequent Concepts based Clustering WordNet

Powered by PhDFocusTM