|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 186 - Issue 32 |
| Published: August 2024 |
| Authors: Khaled Sh. Raslan, Almohammady S. Alsharkawy, K.R. Raslan |
10.5120/ijca2024923849
|
Khaled Sh. Raslan, Almohammady S. Alsharkawy, K.R. Raslan . iHHO-SMOTe: A Cleansed Approach for Handling Outliers and Reducing Noise to Improve Imbalanced Data Classification. International Journal of Computer Applications. 186, 32 (August 2024), 1-10. DOI=10.5120/ijca2024923849
@article{ 10.5120/ijca2024923849,
author = { Khaled Sh. Raslan,Almohammady S. Alsharkawy,K.R. Raslan },
title = { iHHO-SMOTe: A Cleansed Approach for Handling Outliers and Reducing Noise to Improve Imbalanced Data Classification },
journal = { International Journal of Computer Applications },
year = { 2024 },
volume = { 186 },
number = { 32 },
pages = { 1-10 },
doi = { 10.5120/ijca2024923849 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2024
%A Khaled Sh. Raslan
%A Almohammady S. Alsharkawy
%A K.R. Raslan
%T iHHO-SMOTe: A Cleansed Approach for Handling Outliers and Reducing Noise to Improve Imbalanced Data Classification%T
%J International Journal of Computer Applications
%V 186
%N 32
%P 1-10
%R 10.5120/ijca2024923849
%I Foundation of Computer Science (FCS), NY, USA
Classifying imbalanced datasets remains a significant challenge in machine learning, particularly with big data where instances are unevenly distributed among classes, leading to class imbalance issues that impact classifier performance. While Synthetic Minority Over-sampling Technique (SMOTE) addresses this challenge by generating new instances for the under-represented minority class, it faces obstacles in the form of noise and outliers during the creation of new samples. In this paper, a proposed approach, iHHO-SMOTe, which addresses the limitations of SMOTE by first cleansing the data from noise points. This process involves employing feature selection using a random forest to identify the most valuable features, followed by applying the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm to detect outliers based on the selected features. The identified outliers from the minority classes are then removed, creating a refined dataset for subsequent oversampling using the hybrid approach called iHHO-SMOTe. The comprehensive experiments across diverse datasets demonstrate the exceptional performance of the proposed model, with an AUC score exceeding 0.99, a high G-means score of 0.99 highlighting its robustness, and an outstanding F1-score consistently exceeding 0.967. These findings collectively establish Cleansed iHHO-SMOTe as a formidable contender in addressing imbalanced datasets, focusing on noise reduction and outlier handling for improved classification models.