Design and Analysis of Large Data Processing Techniques

Madhavi Vaidya; Shrinivas Deshpande; Vilas Thakare

Research Article

Design and Analysis of Large Data Processing Techniques

by Madhavi Vaidya, Shrinivas Deshpande, Vilas Thakare

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 100 - Issue 8

Published: August 2014

Authors: Madhavi Vaidya, Shrinivas Deshpande, Vilas Thakare

10.5120/17546-8139

PDF

Madhavi Vaidya, Shrinivas Deshpande, Vilas Thakare . Design and Analysis of Large Data Processing Techniques. International Journal of Computer Applications. 100, 8 (August 2014), 24-28. DOI=10.5120/17546-8139

                        @article{ 10.5120/17546-8139,
                        author  = { Madhavi Vaidya,Shrinivas Deshpande,Vilas Thakare },
                        title   = { Design and Analysis of Large Data Processing Techniques },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 100 },
                        number  = { 8 },
                        pages   = { 24-28 },
                        doi     = { 10.5120/17546-8139 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2014
                        %A Madhavi Vaidya
                        %A Shrinivas Deshpande
                        %A Vilas Thakare
                        %T Design and Analysis of Large Data Processing Techniques%T 
                        %J International Journal of Computer Applications
                        %V 100
                        %N 8
                        %P 24-28
                        %R 10.5120/17546-8139
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

As massive data acquisition and storage becomes increasingly affordable, a large number of enterprises are employing statisticians to make the sophisticated data analysis. Particularly, information extraction is done when the data is unstructured or semi-structured in nature. There are emerging efforts taken by both academia and industry on pushing information extraction inside parallel DBMSs. This leads to solving an significant and important issue on what can be a better choice for large scale data processing and analytics. To address this issue, we highlight the comparison and analysis of the three techniques which are nothing but the Parallel DBMS, MapReduce and Bulk Synchronous Processing in this paper.

References

A Text from mongoDB official website, "Big Data:Examples and Guidelines for the Enterprise Decision Maker", May 2013
Feng Wang,Bo Dong,Jie Qiu,Xinhui Li,Jie Yang,Ying Li, Hadoop High Availability through Metadata Replication, CloudDB'09 Proceedings of the First International Workshop on Cloud data management, ACM , Pages 37-44, 2009
Daniel Peng and Frank Dabek, Large-scale Incremental Processing Using Distributed Transactions and Noti?cations, Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, USENIX , 2010
An article written by Michael Walker , www. analyticbridge. com/profiles/blogs/percolator-dremel-and-pregel-alternatives-to-hadoop, August 12, 2012
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elemeleegy, Russel Sears, Map Reduce Online, Proceedings in NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation, pages 1-14, Oct 9 2009
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber,Bigtable: A Distributed Storage System for Structured Data, ACM Transactions on Computer Systems (TOCS), Volume 26 Issue 2, Article No. 4 , Pages1-14, June 2008
Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin, Map Reduce and Parallel DBMSs Friends or Foes, communications of the ACM, Vol. 53 No. 1, Pages 64-71, January 2010
J. Dean, S. Ghemawat, MapReduce: Simpli?ed Data Processing on Large Clusters, ACM Symposium on Operating Systems Design & Implementation - Volume 6, Pages 137–150, 2004
Apache Software Foundation, Hadoop MapReduce, http://hadoop. apache. org/mapreduce, March 2012
Apache Software Foundation, Hadoop Wiki: PoweredBy,http://wiki. apache. org/hadoop/PoweredBMarch 2012
Michael Stonebraker, Daniel Abadi, J. Dewitt, Sam Madden,Erik Paulson, Andrew Pavlo, Alexander Rasin, MapReduce and Parallel DBMSs: friends or foes?", Communications of the ACM, Volume 53 Issue 1,Pages 64-71, January 2010
L. G. Valiant, A Bridging Model for Parallel Computation, Communications of the ACM, Pages 103–111, 1990
M. T. Goodrich, N. Sitchinava, Q. Zhang, Sorting, Searching and Simulation in the MapReduce Framework, ArXiv e-prints, Pages 1-11, January 2011
Kaushik Chandrasekaran, "Analysis of Different Parallel Programming Models", Indiana University
Kyo-Hang Lee, Hyunsak Choi, Mongki Moon, Parallel Data Processing with MapReduce: A Survey, SIGMOD Record, Vol. 40, No. 4, Pages 11-20, , December 2011
J. Lin and C. Dyer, Data-Intensive Text Processing with MapReduce. Syn. Lec. on Human Lang. Tech. -10
G. Weikum, J. Ho?art, N. Nisakashole, M. Spaniol, F. Suchanek, M. Yosef, Big data methods for Computational Linguistics, IEEE Data Eng. Bulletin, 2012
Yu Xu,Pekka Kostamaa,Like Gao, Integrating Hadoop and Parallel DBMS, ACM SIGMOD'10 ACM SIGMOD International Conference on Management of Data Pages 969-974, 2010
Book on "Hadoop: The Definitive Guide" by Tom White by O'Reilly Publication, 2010
Shahfik Amasha, Distributed-Data-Analysis-Using-Map-Reduce, Singapore University
Xiaqing Wu, Rodrigo Carceroni, Hui Fang, Steve Zelinka, Andrew Kirmse, Automatic Alignment of Large-Scale Aerial raster's to Road-Maps, 15th annual ACM international symposium on Advances in Geographic Information Systems, Article No. 17, 2007
Christine Jardak, Janne Riihijärvi, Frank Oldewurtel, and Petri Mähönen, Parallel Processing of Data from Very Large-Scale Wireless Sensor Networks, HPDC '10 Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Pages 787-794, 2010
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi,A. Silberschatz, and A. Rasin. Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow, Pages 922–933, 2009
J. N. Hoover. Start-ups bring google's parallel processing to data warehousing. 2008
Thesis of Miriam Lawrence Mchome, Comparison study between MapReduce(MR) and Parallel Data Management Systems in Large Scale Data Analysis, 2011

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Parallel MapReduce Hadoop BSP Distributed