Research Article

Parallel DNA Sequence Approximate Matching with Multi-Length Sequence Aware Approach

by  Hadeel Alazzam, Ahmad Sharieh
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 180 - Issue 26
Published: Mar 2018
Authors: Hadeel Alazzam, Ahmad Sharieh
10.5120/ijca2018916594
PDF

Hadeel Alazzam, Ahmad Sharieh . Parallel DNA Sequence Approximate Matching with Multi-Length Sequence Aware Approach. International Journal of Computer Applications. 180, 26 (Mar 2018), 1-6. DOI=10.5120/ijca2018916594

                        @article{ 10.5120/ijca2018916594,
                        author  = { Hadeel Alazzam,Ahmad Sharieh },
                        title   = { Parallel DNA Sequence Approximate Matching with Multi-Length Sequence Aware Approach },
                        journal = { International Journal of Computer Applications },
                        year    = { 2018 },
                        volume  = { 180 },
                        number  = { 26 },
                        pages   = { 1-6 },
                        doi     = { 10.5120/ijca2018916594 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2018
                        %A Hadeel Alazzam
                        %A Ahmad Sharieh
                        %T Parallel DNA Sequence Approximate Matching with Multi-Length Sequence Aware Approach%T 
                        %J International Journal of Computer Applications
                        %V 180
                        %N 26
                        %P 1-6
                        %R 10.5120/ijca2018916594
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

DNA sequence approximate matching is one of the main challenges in Bioinformatics. Despite the evolution of new technology, there is still a need for new algorithms that accommodate the huge amount of Bioinformatics data. In this paper, a parallel n-gram approach is proposed with a method that is taking in mind the variety of DNA sequence lengths for approximate matching. The proposed approach showed a satisfiability result in terms of time complexity compared to parallel dynamic programming method.

References
  • M.I. Khalil. Locating all common subsequences in two dna sequences. Information Technology and Computer Science, 5:81–87, 2016.
  • Diao Y. Gyllstrom-D. Agrawal, J. and N. Immerman. Efficient pattern matching over event streams. . In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 147–160, June 2008.
  • R. Bhukya and D. V. L. N. Somayajulu. Exact multiple pattern matching algorithm using dna sequence and pattern pair. International Journal of Computer Applications, 17(8):32–38, 2011.
  • N. Singla and D. Garg. String matching algorithms and their applicability in various applications. International journal of soft computing and engineering, 1(6):218–222, 2012.
  • J. Kawulok. Approximate string matching for searching dna sequences. International Journal of Bioscience, Biochemistry and Bioinformatics, 3(2):145, 2013.
  • A. A. Almazroi. A fast hybrid algorithm approach for the exact string matching problem via berry ravindran and alpha skip search algorithms. Journal of Computer Science, 7(5):466, 2011.
  • M. O. Kulekci. Filter based fast matching of long patterns by using simd instructions. In Stringology, pages 118–128, August 2009.
  • Mustafa I. S. Sharieh, A. A. A. and N Obeid. Row column diagonal using multithreads for sequence alignment in dna. European Journal of Scientific Research, 30(1):6–25, 2009.
  • Holub J. Peltola H. Durian, B. and J. Tarhio. Improving practical exact string matching. Information Processing Letters, 110(4):148–152, 2010.
  • Naser M. A. S. Al-Dabbagh, S. S. M. and N. H. Barnouti. Fast hybrid string matching algorithm based on the quick-skip and tuned boyer-moore algorithms. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 8(6):117–127, 2017.
  • Whang K. Kim, M. and J Lee. n-gram/2l-approximation: a two-level n-gram inverted index structure for approximate string matching. Computer Systems Science and Engineering, 22(6):365, 2007.
  • Whang K. Y. Lee J. G. Kim, M. S. and M. J Lee. n-gram/2l: A space and time efficient two-level n-gram inverted index structure. In Proceedings of the 31st international conference on Very large data bases, pages 325–336, August 2005.
  • Yao N. Fan, H. and H. Ma. Fast variants of the backward-oracle-marching algorithm. Fourth International Conference on, 34:56–59, December 2009.
  • K. Fredriksson and S. Grabowski. Practical and optimal string matching. In SPIRE, 3772:376–387, November 2005.
  • H. Peltola and J. Tarhio. Alternative algorithms for bit-parallel string matching. In SPIRE, 2857:80–94, January 2003.
  • R. S. Boyer and J. S Moore. A fast string searching algorithm. Communications of the ACM, 20(10):762–772, 1977.
  • Gelbukh A. Gmez-Adorno H. Sidorov, G. and D. Pinto. Soft similarity and soft cosine measure: Similarity of features in vector space model. Computacin y Sistemas, 18(3):491–504, 2014.
  • Sardaraz M. Tahir, M. and A. A. Ikram. Epma: Efficient pattern matching algorithm for dna sequences. Expert Systems with Applications, 80:162–170, 2017.
  • M. V. Ramakrishnan and M. S. Eswaran. Acomparative study of various parallel longest common subsequence (lcs) algorithms. International Journal of Computer Trends and Technology, 4(2), 2013.
  • R. C NCBI. Database resources of the national center for biotechnology information. FNucleic acids research, 45(D1):56–59, 2017.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

DNA Sequence Longest Common Sequence N-gram Parallel

Powered by PhDFocusTM