Research Article

Cluster Analysis Technique based on Bipartite Graph for Human Protein Class Prediction

by  Manpreet Singh, Gurvinder Singh
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 20 - Issue 3
Published: April 2011
Authors: Manpreet Singh, Gurvinder Singh
10.5120/2414-3226
PDF

Manpreet Singh, Gurvinder Singh . Cluster Analysis Technique based on Bipartite Graph for Human Protein Class Prediction. International Journal of Computer Applications. 20, 3 (April 2011), 22-27. DOI=10.5120/2414-3226

                        @article{ 10.5120/2414-3226,
                        author  = { Manpreet Singh,Gurvinder Singh },
                        title   = { Cluster Analysis Technique based on Bipartite Graph for Human Protein Class Prediction },
                        journal = { International Journal of Computer Applications },
                        year    = { 2011 },
                        volume  = { 20 },
                        number  = { 3 },
                        pages   = { 22-27 },
                        doi     = { 10.5120/2414-3226 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2011
                        %A Manpreet Singh
                        %A Gurvinder Singh
                        %T Cluster Analysis Technique based on Bipartite Graph for Human Protein Class Prediction%T 
                        %J International Journal of Computer Applications
                        %V 20
                        %N 3
                        %P 22-27
                        %R 10.5120/2414-3226
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

In the present paper, the cluster analysis as a form of unsupervised learning is implemented for human protein class prediction. The data related to human protein is accessed from Human Protein Reference Database (HPRD). From HPRD, the sequences related to ten molecular classes are obtained. For each of the molecular class five amino acid sequences are obtained. Then with the help of various web based tools, SDFs (Sequence derived Features) are extracted for each sequence. By analyzing the variation in the values of the obtained SDFs, priorities are assigned to them. Because each sequence has some value for each of the SDF, so obtained data is a complete weighted bipartite graph consisting of two independent set of nodes i.e. one set of all the sequences and second of all SDFs. Then bipartite graph is represented into the memory with adjacency weight matrix. On the basis of values of input SDFs and by considering priority of each of the SDF, clusters of the data available in the adjacency matrix are generated. Then those clusters are backtracked to predict the class of the entered sequence.

References
  • Friedberg I. 2006. “Automated Protein Function Prediction-the genomic challenge”, Briefings in Bioinformatics, Vol. 7, No. 3.January 2006, pp. 225-242.
  • Krane, D. and Raymer, M. 2006. Fundamental Concepts of Bioinformatics, Pearson Education, New Delhi.
  • Singh Manpreet, Singh Parvinder and Wadhwa Parminder Kaur, 2007. “Human Protein Function Prediction using Decision Tree Induction”, International Journal of Computer Science and Network Security, Vol. 7, No. 4, pp. 92-98.
  • Singh Manpreet, Wadhwa P.K., Kaur Surinder, 2008. “Predicting Protein Function using Decision Tree” World Academy of Science, Engineering and Technology, issue 39, pp. 350-353.
  • Kaur Reet Kamal, Kaur Manjot, Kaur Amanjot. 2010. "Using Cluster Analysis for Protein Secondary Structure Prediction" International Journal of Computer Applications, Vol. 4, No. 12, August 2010, pp. 20-22.
  • Singh Manpreet, Singh Gurvinder and Kahlon Karanjeet Singh, 2009. “Analyzing the Cluster for Protein Sequence Alignment”, PCTE Journal of Computer Sciences, Vol. 6, issue 1, 2009, pp. 74-83.
  • Human Protein Reference Database (HPRD) http://www.hprd.org/moleculeClass
  • Jensen L. 2002. Prediction of Protein Function from Sequence Derived Protein Features, Ph.D. thesis, Technical University of Denmark.
  • Jensen L., Skovgaard M. and Brunak S. 2002. “Prediction of Novel Archaeal Enzymes from Sequence Derived Features”, Protein Science, Vol. 11, pp. 2894-2898.
  • Jensen L.J., Gupta R., Blom N., Devos D., Tamames J., Kesmir C., Nielsen H., Stærfeldt H.H., Rapacki K., Workman C., Andersen C.A.F., Knudsen S., Krogh A., Valencia A. and Brunak S. 2002. “Prediction of Human Protein Function from Post-Translational Modifications and Localization Features”, Journal of Molecular Biology, Vol. 319(5), pp. 1257-1265.
  • Charu C. A., Haixun W. 2010. Managing and Mining Graph Data, Springer.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Protein class prediction cluster analysis bipartite graph

Powered by PhDFocusTM