Research Article

An Exploratory Survey of Hadoop Log Analysis Tools

by  Madhury Mohandas, Dhanya P M
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 75 - Issue 18
Published: August 2013
Authors: Madhury Mohandas, Dhanya P M
10.5120/13350-0750
PDF

Madhury Mohandas, Dhanya P M . An Exploratory Survey of Hadoop Log Analysis Tools. International Journal of Computer Applications. 75, 18 (August 2013), 33-36. DOI=10.5120/13350-0750

                        @article{ 10.5120/13350-0750,
                        author  = { Madhury Mohandas,Dhanya P M },
                        title   = { An Exploratory Survey of Hadoop Log Analysis Tools },
                        journal = { International Journal of Computer Applications },
                        year    = { 2013 },
                        volume  = { 75 },
                        number  = { 18 },
                        pages   = { 33-36 },
                        doi     = { 10.5120/13350-0750 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2013
                        %A Madhury Mohandas
                        %A Dhanya P M
                        %T An Exploratory Survey of Hadoop Log Analysis Tools%T 
                        %J International Journal of Computer Applications
                        %V 75
                        %N 18
                        %P 33-36
                        %R 10.5120/13350-0750
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

In view of the fact that clusters used in large scale computing are on the rise, ensuring the wellbeing of these clusters is of paramount significance. This highlights the importance of supervising and monitoring the cluster. In this regard, many tools have been contributed that can efficiently monitor the Hadoop cluster. The majority of these tools congregates necessary information from each of the node in the cluster and takes it for processing. These diagnosis tools are mostly post execution analysis tools. This paper presents an exploratory assessment of the different log analyzers used for failure detection and monitoring in Hadoop.

References
  • Hadoop, http://hadoop. apache. org/.
  • W. Tom, Hadoop:the definitive guide( O'reilly media, May 2009)
  • K. Shvachko, Hdfs scalability: The limits to growth, The USENIX Magazine , 35(2), 2010
  • S. Ghemawat, H. Gobioff, and Leung, "The Google File System," SIGOPS Oper. Syst. Rev. , 37(5):29–43, 2003
  • D. Borthakur, HDFS Architecture, http://hadoop. apache. org/common/ docs/r0. 20. 0/ hdfs_design. html, April 2009
  • K. Shvachko, H. Huang, S. Radia, and R. Chansler, The hadoop distributed file system, In 26th IEEE (MSST2010) Symposium on Massive Storage Systems and Technologies, May 2010.
  • J. Dean and S. Ghemawat, Mapreduce: simplified data processing on large clusters, In Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, pages 10–10, Berkeley, CA, USA, 2004.
  • Scribe, https://github. com/facebook/scribe.
  • Scribe logfile aggregation system described by Facebook's Jeff Hammerbacher https://issues. apache. org/jira/browse/HADOOP-2206?focusedCommentId=12542775#action 12542775
  • Chukwa, http://wiki. apache. org/hadoop/Chukwa
  • Gridmix3 – Emulating Production Workload for Apache Hadoop, www. usenix. org/conference/fast-10/gridmix3-emulating-production-io-workload-apache-hadoop
  • Vaidya, http://hadoop. apache. org/docs/stable/vaidya. html
  • Revisiting the physician : Hadoop Vaidya, http://www. hadoopsphere. com/2013/01/revisiting-physician-hadoop-vaidya. html
  • J. Tan, X. Pan, S. Kavulya, R. Gandhi, and P. Narasimhan, Salsa: Analyzing logs as state machines, In Workshop on Analysis of System Logs, San Diego, CA, Dec 2008.
  • Log4J, http://logging. apache. org/log4j, 2007
  • J. Tan, X. Pan, S. Kavulya, R. Gandhi, and P. Narasimhan, Mochi: visual log-analysis based tools for debugging hadoop, In Proceedings of the 2009 conference on Hottopics in cloud computing, HotCloud'09, Berkeley, CA, USA, 2009.
  • Matthew L. Massie, Brent N. Chun, and David E. Culler, The Ganglia Distributed Monitoring System: Design, Implementation, and Experience, In Parallel Computing Volume 30, Issue 7, pp 817-840, 2004
  • J. Boulon, A. Konwinski, R. Qi, A. Rabkin, E. Yang, and M. Yang, Chukwa, a large-scale monitoring system, In First Workshop on Cloud Computing and its Applications (CCA '08), Chicago, IL, 2008
  • Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica, X-Trace: A Pervasive Network Tracing Framework, In 4th USENIX Symposium on Networked Systems Design & Implementation (NSDI'07), Cambridge, MA, USA, April 2007
  • A. Rabkin, R Katz, Chukwa: a system for reliable large-scale log collection, In Proceedings of the 24th International Conference on Large Installation System Administration LISA'10, USENIX Association Berkeley, CA, USA.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Cloud computing HDFS Failure monitoring Hadoop Log analyzer

Powered by PhDFocusTM