Research Article

An Efficient Bulk Synchronous Parallelized Scheduler for Bioinformatics Application on Public Cloud

by  Siddu P. Algur, Leena I. Sakri
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 145 - Issue 15
Published: Jul 2016
Authors: Siddu P. Algur, Leena I. Sakri
10.5120/ijca2016910882
PDF

Siddu P. Algur, Leena I. Sakri . An Efficient Bulk Synchronous Parallelized Scheduler for Bioinformatics Application on Public Cloud. International Journal of Computer Applications. 145, 15 (Jul 2016), 22-30. DOI=10.5120/ijca2016910882

                        @article{ 10.5120/ijca2016910882,
                        author  = { Siddu P. Algur,Leena I. Sakri },
                        title   = { An Efficient Bulk Synchronous Parallelized Scheduler for Bioinformatics Application on Public Cloud },
                        journal = { International Journal of Computer Applications },
                        year    = { 2016 },
                        volume  = { 145 },
                        number  = { 15 },
                        pages   = { 22-30 },
                        doi     = { 10.5120/ijca2016910882 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2016
                        %A Siddu P. Algur
                        %A Leena I. Sakri
                        %T An Efficient Bulk Synchronous Parallelized Scheduler for Bioinformatics Application on Public Cloud%T 
                        %J International Journal of Computer Applications
                        %V 145
                        %N 15
                        %P 22-30
                        %R 10.5120/ijca2016910882
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Genomic sequence alignment of varied species is one of the most sort of applications in bioinformatics. In future bioinformatics technologies are expected to produce genomic data of terabyte. Bioinformatics computation require super computer for sequence alignment computation which involves huge cost. Parallelization technique is a way forward in computing sequence alignment with limited cost and time. Cloud computing and MapReduce framework play an important role in bioinformatics intensive application to achieve parallelization since it provides a consistent performance over time and it also provides good fault tolerant mechanism. The existing gene sequencing methodologies are designed based on Hadoop-MapReduce framework which adopts a serial execution strategy which is an area of concern. This work introduces a Smith-Waterman Alignment on the Bulk synchronous Parallel Map Reduce (SW-BSPMR) cloud platform for bioinformatics gene sequence alignment. This work adopts a widely accepted and accurate SW algorithm for sequence alignment and parallel synchronous scheduler methodology of map and reduce framework process is considered. A customized MapReduce based on Microsoft Azure cloud platform is developed to overcome the issue in Hadoop-MapReduce framework. The experimental study presented in this work proves that the SW-BSPMR can accurately and effectively align bioinformatics genomic sequences of various read length.

References
  • Taylor N. Job and Jin H. Park “Exploiting High Performance on Bioinformatics Applications in a Cloud System”, vol. 22, no. 2, pp.22-24, 2014
  • T.F. Smith and M.S. Waterman, “Identification of Common Molecular Subsequences,” J. Molecular Biology, vol. 147, no. 1, pp. 195-197, Mar. 1981
  • O. Gotoh, “An Improved Algorithm for Matching Biological Sequences,” J. Molecular Biology, vol. 162, no. 3, pp. 705-708, Dec. 1982.
  • W.R. Pearson and D.J. Lipman "Improved Tools for Biological Sequence Comparison" US National Academy of Sciences, vol. 85, pp. 2444-2448, 1988.
  • S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, "Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, pp. 3389-3402, 1997.
  • W. James Kent "BLAT-The BLAST-Like Alignment Tool", Genome Res., vol. 12, no. 4, pp.656 -664 2002
  • Li R, Li Y, Kristiansen K, Wang J. SOAP: shortoligo nucleotide alignment program. BMC Bioinformatics 24(5):713714, 2008.
  • T. Nguyen, et al., "CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping," BMC Res Notes, vol. 4, p. 171, 2011.
  • Bakery M, Buyyaz R. Cluster computing at a glance. In: Buyyaz R, ed. High Performance Cluster Computing: Architectures and System. Upper Saddle River, NJ: Prentice-Hall; 1999:3–47.
  • Schatz MC, Langmead B, Salzberg SL: Cloud computing and the DNA data race. Nat Biotechnol 2010, 28(7):691–693.
  • J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” in OSDI, 2004, pp. 137–150.
  • M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica “Spark: Cluster Computing with Working Sets,” in Proceedings of the 2nd USENIX Conference on Hot topics in Cloud Computing, (Boston, MA), June 2010.
  • D. Singh and C. K. Reddy, “A survey on platforms for big data analytics,” Journal of Big Data, vol. 2, article 8, 2014.
  • Jianhua Zhang; Wenbo Zhang; Heng Wu; Tao Huang, "VMFDF: A Virtualization-based Multi-Level Fault Detection Framework for High Availability Computing," e-Business Engineering (ICEBE), 2012 IEEE Ninth International Conference on , vol., no., pp.367,373, 9-11 Sept. 2012
  • Chuliang Weng; Jianfeng Zhan; Yuan Luo, "TSAC: Enforcing Isolation of Virtual Machines in Clouds," Computers, IEEE Transactions on, vol.64, no.5, pp.1470, 1482, May 1 2015
  • J. E. Smith and R. Nair, Virtual Machines: Versatile Platforms for Systems and Processes. New York, NY, USA: Elsevier, 2005
  • Hadoop, http://hadoop.apache.org
  • T. Nguyen, et al., "CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping," BMC Res Notes, vol. 4, p. 171, 2011.
  • Schatz M: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 2009, 25(11):1363.
  • G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system for largescale graph processing,” in Proceedings of the 2010 international conference on Management of data, ser. SIGMOD ’10. New York, NY, USA: ACM, 2010, pp. 135–146.
  • J.Ekanayake, H.Li, B.Zhang et al., "Twister: A Runtime for iterative MapReduce," in Proceedings of the First International Workshop on MapReduce and its Applications of ACM HPDC 2010 conference June 20-25, 2010, Chicago, Illinois, 2010
  • Jiang, D.; Tung, A.; Gang Chen, "MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters," Knowledge and Data Engineering, IEEE Transactions on , vol.23, no.9, pp.1299,1311, Sept. 2011
  • Dahiphale, D.; Karve, R.; Vasilakos, A.V.; Huan Liu; Zhiwei Yu; Chhajer, A.; Jianmin Wang; Chaokun Wang, "An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications," Network and Service Management, IEEE Transactions on , vol.11, no.1, pp.101,115, March 2014
  • Feng X, Grossman R, and Stein L: PeakRanger: a cloud-enabled peak caller for ChIP-seq data. BMC Bioinformatics 2011, 12:139.
  • Saccharomyces genome database (SGD). (2015). [Online] Available: http://www.yeastgenome.org/
  • Marinescu, D.C., "Parallel and Distributed Computing: Memories of Time Past and a Glimpse at the Future," Parallel and Distributed Computing (ISPDC), 2014 IEEE 13th International Symposium on , vol., no., pp.14,15, 24-27 June 2014
  • Gartner, Inc. Gartner says worldwide cloud services market to surpass $68 billion in 2010. http://www.gartner.com/it/page.jsp?id=1389313,
  • P. Mell and T. Grance, The NIST Definition of Cloud Computing, US National Institute of Science and Techonology Std., 2011.[Online]. Available: http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
  • Bera, S. Misra, S, Rodrigues J.J.P.C, "Cloud Computing Applications for Smart Grid: A Survey," Parallel and Distributed Systems, IEEE Transactions on, vol.PP, no.99, pp.1, 1.PrePrints.doi: 10.1109/TPDS.2014.2321378
  • P. Mehrotra, J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,S. Saini, and R. Biswas, “Performance Evaluation of Amazon EC2 for NASA HPC applications,” in Proceedings of the 3rd workshop on Scientific Cloud Computing. New York, NY, USA: ACM, 2012
  • Chun Hui Suen, "Evaluating and Improving the Performance and Scheduling of HPC Applications in Cloud", IEEE Transactions on Cloud Computing, , no. 1, pp. 1, PrePrints , doi:10.1109/TCC.2014.2339858
  • Dahiphale, D.; Karve, R.; Vasilakos, A.V.; Huan Liu; Zhiwei Yu; Chhajer, A.; Jianmin Wang; Chaokun Wang, "An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications," Network and Service Management, IEEE Transactions on , vol.11, no.1, pp.101,115, March 2014
  • G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system for largescale graph processing,” in Proceedings of the 2010 international conference on Management of data, ser. SIGMOD ’10. New York, NY, USA: ACM, 2010, pp. 135–146.
  • Kajdanowicz, T.; Indyk, W.; Kazienko, P.; Kukul, J., "Comparison of the Efficiency of MapReduce and Bulk Synchronous Parallel Approaches to Large Network Processing," Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on , vol., no., pp.218,225, 10-10 Dec. 2012
  • Hyungro Lee: Using Bioinformatics Applications on the Cloud.
  • Michael C. Schatz: CloudBurst: highly sensitive read mapping with MapReduce.
  • Tung Nguyen, Weisong Shi and Douglas Ruden: CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.
  • LI Xubin, JIANG Wenrui, JIANG Yi and ZOU Quan: Hadoop Applications in Bioinformatics.
  • Xiao-liang Yang, Yu-long Liu, Chun-feng Yuan, Yi-hua Huang: Parallelization of BLAST with MapReduce for Long Sequence Alignment
  • Hdinsight (hadoop on Azure)," https://www.hadooponAzure.com/.
  • Baheti, V.K., "Windows Azure HDInsight: Where big data meets the cloud," IT in Business, Industry and Government (CSIBIG), 2014 Conference on, vol., no., pp.1,2, 8-9 March 2014
  • Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (Mar. 2003), 1289-1305.
  • Brown, L. D., Hua, H., and Gao, C. 2003. A widget framework for augmented interaction in SCAPE.
  • Y.T. Yu, M.F. Lau, "A comparison of MC/DC, MUMCUT and several other coverage criteria for logical decisions", Journal of Systems and Software, 2005, in press.
  • Spector, A. Z. 1989. Achieving application requirements. In Distributed Systems, S. Mullender
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Bioinformatics genomic sequence Scheduler parallelization hadoop Microsoft Azure

Powered by PhDFocusTM