International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 66 - Issue 21 |
Published: March 2013 |
Authors: Shrawan Kumar Trivedi, Shubhamoy Dey |
![]() |
Shrawan Kumar Trivedi, Shubhamoy Dey . Effect of Various Kernels and Feature Selection Methods on SVM Performance for Detecting Email Spams. International Journal of Computer Applications. 66, 21 (March 2013), 18-23. DOI=10.5120/11240-6433
@article{ 10.5120/11240-6433, author = { Shrawan Kumar Trivedi,Shubhamoy Dey }, title = { Effect of Various Kernels and Feature Selection Methods on SVM Performance for Detecting Email Spams }, journal = { International Journal of Computer Applications }, year = { 2013 }, volume = { 66 }, number = { 21 }, pages = { 18-23 }, doi = { 10.5120/11240-6433 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2013 %A Shrawan Kumar Trivedi %A Shubhamoy Dey %T Effect of Various Kernels and Feature Selection Methods on SVM Performance for Detecting Email Spams%T %J International Journal of Computer Applications %V 66 %N 21 %P 18-23 %R 10.5120/11240-6433 %I Foundation of Computer Science (FCS), NY, USA
This Research presents the effects of interaction between various Kernel functions and different Feature Selection Techniques for improving the learning capability of Support Vector Machine (SVM) in detecting email spams. The interaction of four Kernel functions of SVM i. e. "Normalised Polynomial Kernel (NP)", "Polynomial Kernel (PK)", "Radial Basis Function Kernel (RBF)", and "Pearson VII Function-Based Universal Kernel (PUK)" with three feature selection techniques i. e. "Gain Ratio ( )", "Chi-Squared ( ), and "Latent Semantic Indexing ( )" have been tested on the "Enron Email Data Set". The results reveal some interesting facts regarding the variation of the performance of Kernel functions with the number of features (or dimensions) in the data. NP performs the best across a wide range of dimensionality, for all the feature selection techniques tested. PUK kernel works well with low dimensional data and is the second best in performance (after NP), but shows poor performance for high dimensional data. Latent Semantic Indexing (LSI) appears to be the best amongst all the tested feature selection techniques. However, for high dimensional data, all the feature selection techniques perform almost equally well.