Research Article

Study of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA

by  Nikita Patel, Saurabh Upadhyay
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 60 - Issue 12
Published: December 2012
Authors: Nikita Patel, Saurabh Upadhyay
10.5120/9744-4304
PDF

Nikita Patel, Saurabh Upadhyay . Study of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA. International Journal of Computer Applications. 60, 12 (December 2012), 20-25. DOI=10.5120/9744-4304

                        @article{ 10.5120/9744-4304,
                        author  = { Nikita Patel,Saurabh Upadhyay },
                        title   = { Study of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA },
                        journal = { International Journal of Computer Applications },
                        year    = { 2012 },
                        volume  = { 60 },
                        number  = { 12 },
                        pages   = { 20-25 },
                        doi     = { 10.5120/9744-4304 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2012
                        %A Nikita Patel
                        %A Saurabh Upadhyay
                        %T Study of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA%T 
                        %J International Journal of Computer Applications
                        %V 60
                        %N 12
                        %P 20-25
                        %R 10.5120/9744-4304
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Classification is important problem in data mining. Given a data set, classifier generates meaningful description for each class. Decision trees are most effective and widely used classification methods. There are several algorithms for induction of decision trees. These trees are first induced and then prune subtrees with subsequent pruning phase to improve accuracy and prevent overfitting. In this paper, various pruning methods are discussed with their features and also effectiveness of pruning is evaluated. Accuracy is measured for diabetes and glass dataset with various pruning factors. The experiments are shown for this two datasets for measuring accuracy and size of the tree.

References
  • Dipti D. Patil, V. M. Wadhai, J. A. Gokhale, "Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy",IJCSE,volume-II.
  • Jiawei Han, MichelineKamber, "Data Mining Concepts and Techniques", pp. 279-328, 2001.
  • Tom. M. Mitchell, "Machine Learning", McGraw-Hill Publications, 1997
  • "Application of Data Mining Techniques for Medical Image Classification" Proceedings of the Second International Workshop on multimedia Data Mining(MDM/KDD'2001) in conjuction with ACM SIGKDD conference. San Francisco,USA,August 26,2001.
  • Cscu. cornell. edu, 2003 [Online] SimonaDespa, 4 March 2003 Retrievedfromhttp://www. cscu. cornell. edu/news/statnews/stnews55. pdf [Accessed on May 5, 2009]
  • Fu, L. (1994). "Rule generation from neural networks. " IEEE Transactions on Systems, Man and Cybernetics 24(8): 1114-1124.
  • Chih-Wei Hsu ,"A comparison of methods for multiclass support vector machines",Neural Network ,IEEE transaction on mar 2002.
  • J. Quinlan," Simplifying decision trees", Int. J. Human-Computer Studies.
  • J. R. Quinlan, "C4. 5: programs for Machine Learning", Morgan Kaufmann, New York,1993
  • J. R. Quinlan, "Induction of Decision Trees", Machine Learning 1(1986) pp. 81-106.
  • SamDrazin and Matt Montag,"Decision Tree Analysis using Weka", Machine Learning-Project II, University of Miami.
  • K. C. Tan, E. J. Teoh, Q. Yu, K. C. Goh," A hybrid evolutionary algorithm for attribute selection in data mining", Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Singapore. Rochester Institute of Technology, USA.
  • Liangxiao JIANG, Chaoqun LI," An Empirical Study on Attribute Selection Measures in Decision Tree Learning", Journal of Computational Information Systems6:1(2010) 105-112.
  • Max Bramer," Pre-pruning Classification Trees to Reduce Overfitting in Noisy Domains", Faculty of Technology, University of Portsmouth, UK.
  • F. Esposito, D. Malerba, and G. Semeraro,"A comparative Analysis of Methods for Pruning Decision Trees", IEEE transactions on pattern analysis and machine intelligence,19(5): pp. 476-491, 1997.
  • B. Cestnik, and I. Bratko, "Estimating Probabilities in Tree Pruning", EWSL, pp. 138-150, 1991.
  • Esposito F. , Malerba D. , Semeraro G,"A Comparative Analysis of Methods for Pruning Decision Trees", IEEE Transactions on Pattern Analysis and Machine Intelligence, VOL. 19, NO. 5, 1997, P. 476-491.
  • Minhaz Fahim Zibran," CHI-Squared Test of Independence", Department of Computer Science,University of Calgary, Alberta, Canada.
  • David C. Howell," Chi-square test Analysis of Contingency tables", University of Vermont.
  • Jeffrey P. Bradford, Clayton Kunz, Ron Kohavi, Cliff Brunk, Carla E. Brodley," Pruning Decision Trees with Misclassification Costs",ECE Technical Reports. Paper 51.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Attribute Selection Measures Decision tree Post pruning Pre pruning

Powered by PhDFocusTM