International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 75 - Issue 16 |
Published: August 2013 |
Authors: Vrushali Y. Kulkarni, Aashu Singh, Pradeep K Sinha |
![]() |
Vrushali Y. Kulkarni, Aashu Singh, Pradeep K Sinha . An Approach towards Optimizing Random Forest using Dynamic Programming Algorithm. International Journal of Computer Applications. 75, 16 (August 2013), 9-16. DOI=10.5120/13193-0785
@article{ 10.5120/13193-0785, author = { Vrushali Y. Kulkarni,Aashu Singh,Pradeep K Sinha }, title = { An Approach towards Optimizing Random Forest using Dynamic Programming Algorithm }, journal = { International Journal of Computer Applications }, year = { 2013 }, volume = { 75 }, number = { 16 }, pages = { 9-16 }, doi = { 10.5120/13193-0785 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2013 %A Vrushali Y. Kulkarni %A Aashu Singh %A Pradeep K Sinha %T An Approach towards Optimizing Random Forest using Dynamic Programming Algorithm%T %J International Journal of Computer Applications %V 75 %N 16 %P 9-16 %R 10.5120/13193-0785 %I Foundation of Computer Science (FCS), NY, USA
Random Forest (RF) is an ensemble supervised machine learning technique. Based on bagging and random feature selection, number of decision trees (base classifiers) is generated and majority voting is taken among them. The size of RF is subjective and varies from one dataset to another. Furthermore due to the randomization induced during creation, and its huge size, RF has at best been described as black-box. Various changes based on the experimental results have been proposed in the algorithm since then to optimize the performance of RF. To this end, we aim to find a subset, having accuracy comparable to original RF but having a much smaller size. In this paper, we show that the problem of selection of optimal subset of random forest follows the dynamic programming paradigm. Applying this approach to various UCI data-sets, corresponding subsets are obtained and studied. We conclude that such subsets do exist and that they are not unique. Moreover the size of these subsets is small fraction of the original RF (in the range of tens) and that accuracy of these subsets is a discrete valued function over its range.