Research Article

Design and Analysis of Large Data Processing Techniques

by  Madhavi Vaidya, Shrinivas Deshpande, Vilas Thakare
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 100 - Issue 8
Published: August 2014
Authors: Madhavi Vaidya, Shrinivas Deshpande, Vilas Thakare
10.5120/17546-8139
PDF

Madhavi Vaidya, Shrinivas Deshpande, Vilas Thakare . Design and Analysis of Large Data Processing Techniques. International Journal of Computer Applications. 100, 8 (August 2014), 24-28. DOI=10.5120/17546-8139

                        @article{ 10.5120/17546-8139,
                        author  = { Madhavi Vaidya,Shrinivas Deshpande,Vilas Thakare },
                        title   = { Design and Analysis of Large Data Processing Techniques },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 100 },
                        number  = { 8 },
                        pages   = { 24-28 },
                        doi     = { 10.5120/17546-8139 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2014
                        %A Madhavi Vaidya
                        %A Shrinivas Deshpande
                        %A Vilas Thakare
                        %T Design and Analysis of Large Data Processing Techniques%T 
                        %J International Journal of Computer Applications
                        %V 100
                        %N 8
                        %P 24-28
                        %R 10.5120/17546-8139
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

As massive data acquisition and storage becomes increasingly affordable, a large number of enterprises are employing statisticians to make the sophisticated data analysis. Particularly, information extraction is done when the data is unstructured or semi-structured in nature. There are emerging efforts taken by both academia and industry on pushing information extraction inside parallel DBMSs. This leads to solving an significant and important issue on what can be a better choice for large scale data processing and analytics. To address this issue, we highlight the comparison and analysis of the three techniques which are nothing but the Parallel DBMS, MapReduce and Bulk Synchronous Processing in this paper.

References
  • A Text from mongoDB official website, "Big Data:Examples and Guidelines for the Enterprise Decision Maker", May 2013
  • Feng Wang,Bo Dong,Jie Qiu,Xinhui Li,Jie Yang,Ying Li, Hadoop High Availability through Metadata Replication, CloudDB'09 Proceedings of the First International Workshop on Cloud data management, ACM , Pages 37-44, 2009
  • Daniel Peng and Frank Dabek, Large-scale Incremental Processing Using Distributed Transactions and Noti?cations, Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, USENIX , 2010
  • An article written by Michael Walker , www. analyticbridge. com/profiles/blogs/percolator-dremel-and-pregel-alternatives-to-hadoop, August 12, 2012
  • Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elemeleegy, Russel Sears, Map Reduce Online, Proceedings in NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation, pages 1-14, Oct 9 2009
  • Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber,Bigtable: A Distributed Storage System for Structured Data, ACM Transactions on Computer Systems (TOCS), Volume 26 Issue 2, Article No. 4 , Pages1-14, June 2008
  • Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin, Map Reduce and Parallel DBMSs Friends or Foes, communications of the ACM, Vol. 53 No. 1, Pages 64-71, January 2010
  • J. Dean, S. Ghemawat, MapReduce: Simpli?ed Data Processing on Large Clusters, ACM Symposium on Operating Systems Design & Implementation - Volume 6, Pages 137–150, 2004
  • Apache Software Foundation, Hadoop MapReduce, http://hadoop. apache. org/mapreduce, March 2012
  • Apache Software Foundation, Hadoop Wiki: PoweredBy,http://wiki. apache. org/hadoop/PoweredBMarch 2012
  • Michael Stonebraker, Daniel Abadi, J. Dewitt, Sam Madden,Erik Paulson, Andrew Pavlo, Alexander Rasin, MapReduce and Parallel DBMSs: friends or foes?", Communications of the ACM, Volume 53 Issue 1,Pages 64-71, January 2010
  • L. G. Valiant, A Bridging Model for Parallel Computation, Communications of the ACM, Pages 103–111, 1990
  • M. T. Goodrich, N. Sitchinava, Q. Zhang, Sorting, Searching and Simulation in the MapReduce Framework, ArXiv e-prints, Pages 1-11, January 2011
  • Kaushik Chandrasekaran, "Analysis of Different Parallel Programming Models", Indiana University
  • Kyo-Hang Lee, Hyunsak Choi, Mongki Moon, Parallel Data Processing with MapReduce: A Survey, SIGMOD Record, Vol. 40, No. 4, Pages 11-20, , December 2011
  • J. Lin and C. Dyer, Data-Intensive Text Processing with MapReduce. Syn. Lec. on Human Lang. Tech. -10
  • G. Weikum, J. Ho?art, N. Nisakashole, M. Spaniol, F. Suchanek, M. Yosef, Big data methods for Computational Linguistics, IEEE Data Eng. Bulletin, 2012
  • Yu Xu,Pekka Kostamaa,Like Gao, Integrating Hadoop and Parallel DBMS, ACM SIGMOD'10 ACM SIGMOD International Conference on Management of Data Pages 969-974, 2010
  • Book on "Hadoop: The Definitive Guide" by Tom White by O'Reilly Publication, 2010
  • Shahfik Amasha, Distributed-Data-Analysis-Using-Map-Reduce, Singapore University
  • Xiaqing Wu, Rodrigo Carceroni, Hui Fang, Steve Zelinka, Andrew Kirmse, Automatic Alignment of Large-Scale Aerial raster's to Road-Maps, 15th annual ACM international symposium on Advances in Geographic Information Systems, Article No. 17, 2007
  • Christine Jardak, Janne Riihijärvi, Frank Oldewurtel, and Petri Mähönen, Parallel Processing of Data from Very Large-Scale Wireless Sensor Networks, HPDC '10 Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Pages 787-794, 2010
  • A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi,A. Silberschatz, and A. Rasin. Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow, Pages 922–933, 2009
  • J. N. Hoover. Start-ups bring google's parallel processing to data warehousing. 2008
  • Thesis of Miriam Lawrence Mchome, Comparison study between MapReduce(MR) and Parallel Data Management Systems in Large Scale Data Analysis, 2011
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Parallel MapReduce Hadoop BSP Distributed

Powered by PhDFocusTM