Research Article

Scalability of Parallel Genetic Algorithm for Two-mode Clustering

by  Briti Deb, Satish Narayana Srirama
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 94 - Issue 14
Published: May 2014
Authors: Briti Deb, Satish Narayana Srirama
10.5120/16411-5829
PDF

Briti Deb, Satish Narayana Srirama . Scalability of Parallel Genetic Algorithm for Two-mode Clustering. International Journal of Computer Applications. 94, 14 (May 2014), 23-26. DOI=10.5120/16411-5829

                        @article{ 10.5120/16411-5829,
                        author  = { Briti Deb,Satish Narayana Srirama },
                        title   = { Scalability of Parallel Genetic Algorithm for Two-mode Clustering },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 94 },
                        number  = { 14 },
                        pages   = { 23-26 },
                        doi     = { 10.5120/16411-5829 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2014
                        %A Briti Deb
                        %A Satish Narayana Srirama
                        %T Scalability of Parallel Genetic Algorithm for Two-mode Clustering%T 
                        %J International Journal of Computer Applications
                        %V 94
                        %N 14
                        %P 23-26
                        %R 10.5120/16411-5829
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Data matrix having the same set of entity in the rows and cloumns is known as one-mode data matrix, and traditional one-mode clustering algorithms can be used to cluster the rows (or columns) separately. With the popularity of use of two-mode data matrices where the rows and columns have different sets of entities, the need for simultaneous clustering of rows and columns popularly known as two-mode clustering increased. Additionally, the emergence of large data sets and the prediction of Moore's law slow-down have created the challenge of clustering scalability. In this paper, we address the problem of scalability of organizing an unlabelled two-mode dataset into clusters utilizing multicore processor. We propose a parallel genetic algorithm (GA) heuristics based two-mode clustering algorithm, which is an adaptation of the classical Cuthill-McKee Matrix Bandwidth Minimization (MBM) algorithm. The classical MBM method aims at reducing the bandwidth of a sparse symmetric matrix, which we adapted to make it suitable for non-symmetric real-valued matrix. Preliminary results indicate that our algorithm is scalable on multicore processor compared to serial implementation. Future work will include more extensive experiments and evaluations of the system.

References
  • Suderman, M. , & Hallett, M. 2007. Tools for visually exploring biological networks. Bioinformatics, 23(20), 2651-2659.
  • Chu, Cheng-Tao, et al. Map-reduce for machine learning on multicore. NIPS. Vol. 6. (2006).
  • Borgatti, Stephen P. , and Martin G. Everett. Network analysis of 2-mode data. Social networks 19. 3 (1997): 243-269.
  • Van Mechelen, Iven, Hans-Hermann Bock, and Paul De Boeck. Two-mode clustering methods: a structured overview. Statistical methods in medical research 13. 5 (2004): 363-394.
  • E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices In Proc. 24th Nat. Conf. ACM, pages 157–172, (1969).
  • Hageman, J. A. , van den Berg, R. A. , Westerhuis, J. A. , van der Werf, M. J. , & Smilde, A. K. (2008). Genetic algorithm based two-mode clustering of metabolomics data. Metabolomics, 4(2), 141-149.
  • Estivill-Castro, Vladimir (June 2002). Why so many clustering algorithms — A Position Paper. ACM SIGKDD Explorations Newsletter 4 (1): 65–75.
  • Cheng, Y. , & Church, G. M. (2000, August). Biclustering of expression data. In ISMB, Vol. 8, pp. 93-103.
  • Dhillon, I. S. 2001, August. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 269-274. ACM.
  • Liiv, I. 2010. Seriation and matrix reordering methods: An historical overview. Statistical analysis and data mining, 3(2), 70-91.
  • Schaeffer, S. E. 2007. Graph clustering. Computer Science Review, 1(1), 27-64.
  • Esposito, A. , Catalano, M. S. , Malucelli, F. , & Tarricone, L. 1998. A new matrix bandwidth reduction algorithm. Operations Research Letters, 23(3), 99-107.
  • Hageman, J. A. , Malosetti, M. , & Van Eeuwijk, F. A. 2012. Two-mode clustering of genotype by trait and genotype by environment data. Euphytica, 183(3), 349-359.
  • Zhang, P. , Wang, J. , Li, X. , Li, M. , Di, Z. , & Fan, Y. 2008. Clustering coefficient and community structure of bipartite networks. Physica A: Statistical Mechanics and its Applications, 387(27), 6869-6875.
  • Arabie, P. , Schleutermann, S. , Daws, J. , and Hubert, L. 1988, Marketing Applications of Sequencing and Partitioning of Nonsymmetric and/or Two-Mode Matrices, in Data, Expert Knowledge and Decisions, Eds. W. Gaul, and M. Schader, Berlin: Springer-Verlag, 215-224.
  • Fonseca, Y. C. F. A bipartite graph co-clustering approach to ontology mapping, Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data. Colocated with the Second International Semantic Web Conference (ISWC-03), CEUR-WS. org, 2003.
  • Madeira, S. C. , & Oliveira, A. L. 2004. Biclustering algorithms for biological data analysis: a survey. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 1(1), 24-45. Chicago
  • Scrucca, L. 2012. GA: a package for genetic algorithms in R. Journal of Statistical Software, Volume 53, Issue 4.
  • Yeast expression matrix, URL accessed on 02-01-2014: http://arep. med. harvard. edu/biclustering/yeast. matrix
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Scalability two-mode clustering parallel genetic algorithm matrix reordering

Powered by PhDFocusTM