Scalability of Parallel Genetic Algorithm for Two-mode Clustering

Briti Deb; Satish Narayana Srirama

Research Article

Scalability of Parallel Genetic Algorithm for Two-mode Clustering

by Briti Deb, Satish Narayana Srirama

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 94 - Issue 14

Published: May 2014

Authors: Briti Deb, Satish Narayana Srirama

10.5120/16411-5829

PDF

Briti Deb, Satish Narayana Srirama . Scalability of Parallel Genetic Algorithm for Two-mode Clustering. International Journal of Computer Applications. 94, 14 (May 2014), 23-26. DOI=10.5120/16411-5829

                        @article{ 10.5120/16411-5829,
                        author  = { Briti Deb,Satish Narayana Srirama },
                        title   = { Scalability of Parallel Genetic Algorithm for Two-mode Clustering },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 94 },
                        number  = { 14 },
                        pages   = { 23-26 },
                        doi     = { 10.5120/16411-5829 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2014
                        %A Briti Deb
                        %A Satish Narayana Srirama
                        %T Scalability of Parallel Genetic Algorithm for Two-mode Clustering%T 
                        %J International Journal of Computer Applications
                        %V 94
                        %N 14
                        %P 23-26
                        %R 10.5120/16411-5829
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Data matrix having the same set of entity in the rows and cloumns is known as one-mode data matrix, and traditional one-mode clustering algorithms can be used to cluster the rows (or columns) separately. With the popularity of use of two-mode data matrices where the rows and columns have different sets of entities, the need for simultaneous clustering of rows and columns popularly known as two-mode clustering increased. Additionally, the emergence of large data sets and the prediction of Moore's law slow-down have created the challenge of clustering scalability. In this paper, we address the problem of scalability of organizing an unlabelled two-mode dataset into clusters utilizing multicore processor. We propose a parallel genetic algorithm (GA) heuristics based two-mode clustering algorithm, which is an adaptation of the classical Cuthill-McKee Matrix Bandwidth Minimization (MBM) algorithm. The classical MBM method aims at reducing the bandwidth of a sparse symmetric matrix, which we adapted to make it suitable for non-symmetric real-valued matrix. Preliminary results indicate that our algorithm is scalable on multicore processor compared to serial implementation. Future work will include more extensive experiments and evaluations of the system.

References

Suderman, M. , & Hallett, M. 2007. Tools for visually exploring biological networks. Bioinformatics, 23(20), 2651-2659.
Chu, Cheng-Tao, et al. Map-reduce for machine learning on multicore. NIPS. Vol. 6. (2006).
Borgatti, Stephen P. , and Martin G. Everett. Network analysis of 2-mode data. Social networks 19. 3 (1997): 243-269.
Van Mechelen, Iven, Hans-Hermann Bock, and Paul De Boeck. Two-mode clustering methods: a structured overview. Statistical methods in medical research 13. 5 (2004): 363-394.
E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices In Proc. 24th Nat. Conf. ACM, pages 157–172, (1969).
Hageman, J. A. , van den Berg, R. A. , Westerhuis, J. A. , van der Werf, M. J. , & Smilde, A. K. (2008). Genetic algorithm based two-mode clustering of metabolomics data. Metabolomics, 4(2), 141-149.
Estivill-Castro, Vladimir (June 2002). Why so many clustering algorithms — A Position Paper. ACM SIGKDD Explorations Newsletter 4 (1): 65–75.
Cheng, Y. , & Church, G. M. (2000, August). Biclustering of expression data. In ISMB, Vol. 8, pp. 93-103.
Dhillon, I. S. 2001, August. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 269-274. ACM.
Liiv, I. 2010. Seriation and matrix reordering methods: An historical overview. Statistical analysis and data mining, 3(2), 70-91.
Schaeffer, S. E. 2007. Graph clustering. Computer Science Review, 1(1), 27-64.
Esposito, A. , Catalano, M. S. , Malucelli, F. , & Tarricone, L. 1998. A new matrix bandwidth reduction algorithm. Operations Research Letters, 23(3), 99-107.
Hageman, J. A. , Malosetti, M. , & Van Eeuwijk, F. A. 2012. Two-mode clustering of genotype by trait and genotype by environment data. Euphytica, 183(3), 349-359.
Zhang, P. , Wang, J. , Li, X. , Li, M. , Di, Z. , & Fan, Y. 2008. Clustering coefficient and community structure of bipartite networks. Physica A: Statistical Mechanics and its Applications, 387(27), 6869-6875.
Arabie, P. , Schleutermann, S. , Daws, J. , and Hubert, L. 1988, Marketing Applications of Sequencing and Partitioning of Nonsymmetric and/or Two-Mode Matrices, in Data, Expert Knowledge and Decisions, Eds. W. Gaul, and M. Schader, Berlin: Springer-Verlag, 215-224.
Fonseca, Y. C. F. A bipartite graph co-clustering approach to ontology mapping, Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data. Colocated with the Second International Semantic Web Conference (ISWC-03), CEUR-WS. org, 2003.
Madeira, S. C. , & Oliveira, A. L. 2004. Biclustering algorithms for biological data analysis: a survey. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 1(1), 24-45. Chicago
Scrucca, L. 2012. GA: a package for genetic algorithms in R. Journal of Statistical Software, Volume 53, Issue 4.
Yeast expression matrix, URL accessed on 02-01-2014: http://arep. med. harvard. edu/biclustering/yeast. matrix

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Scalability two-mode clustering parallel genetic algorithm matrix reordering