Research Article

A Machine Learning Approach for Optimized Heart Disease Diagnosis with SMOTE and Voting Classifiers

by  A.S.M. Sabiqul Hassan, Tanzina Tazreen Meem, Md. Ruhul Amin, Tasniah Mohiuddin, Muhammed Samsuddoha Alam, Mst. Najnin Sultana
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 64
Published: December 2025
Authors: A.S.M. Sabiqul Hassan, Tanzina Tazreen Meem, Md. Ruhul Amin, Tasniah Mohiuddin, Muhammed Samsuddoha Alam, Mst. Najnin Sultana
10.5120/ijca2025926079
PDF

A.S.M. Sabiqul Hassan, Tanzina Tazreen Meem, Md. Ruhul Amin, Tasniah Mohiuddin, Muhammed Samsuddoha Alam, Mst. Najnin Sultana . A Machine Learning Approach for Optimized Heart Disease Diagnosis with SMOTE and Voting Classifiers. International Journal of Computer Applications. 187, 64 (December 2025), 30-36. DOI=10.5120/ijca2025926079

                        @article{ 10.5120/ijca2025926079,
                        author  = { A.S.M. Sabiqul Hassan,Tanzina Tazreen Meem,Md. Ruhul Amin,Tasniah Mohiuddin,Muhammed Samsuddoha Alam,Mst. Najnin Sultana },
                        title   = { A Machine Learning Approach for Optimized Heart Disease Diagnosis with SMOTE and Voting Classifiers },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 64 },
                        pages   = { 30-36 },
                        doi     = { 10.5120/ijca2025926079 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2025
                        %A A.S.M. Sabiqul Hassan
                        %A Tanzina Tazreen Meem
                        %A Md. Ruhul Amin
                        %A Tasniah Mohiuddin
                        %A Muhammed Samsuddoha Alam
                        %A Mst. Najnin Sultana
                        %T A Machine Learning Approach for Optimized Heart Disease Diagnosis with SMOTE and Voting Classifiers%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 64
                        %P 30-36
                        %R 10.5120/ijca2025926079
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Heart disease is globally considered a primary cause of a notable number of deaths. Each year, 17.9 million people die from heart disease, according to a report by the World Health Organization (WHO). In this study, a machine learning based optimal model has been developed that primarily includes SMOTE for handling class imbalance in the dataset, and ensemble learning strategies to improve the performance and reliability of heart disease diagnosis. This research work has been conducted on a publicly available dataset from the Kaggle online dataset repository, which includes relevant attributes for heart disease patients. Several base models: LR, KNN, DT, RF, and SVM have been trained for performance evaluation in terms of Accuracy, Precision, Recall, F1-score, and ROC-AUC values. SMOTE has been applied to address the class imbalance issue in the dataset and Soft Voting and Hard Voting classifiers have been used to optimize the model performance by combining all base classifiers. Finally, the Soft Voting classifier has achieved the optimal result: an Accuracy of 70.5%, Precision of 69.8%, Recall of 72.2%, F1-score of 71%, and ROC-AUC of 77%. This optimal model can be used as a decision making tool in the healthcare sector for the early diagnosis of heart diseases followed by necessary steps to prevent those diseases.

References
  • World Health Organization, “Cardiovascular diseases (CVDs),”[Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds). [Accessed: Jul. 31, 2025].
  • N. E. Almansouri et al., “Early Diagnosis of Cardiovascular Diseases in the Era of Artificial Intelligence: An In-Depth Review,” Cureus, vol. 16, no. 3, Mar. 2024, Art. no. e55869, doi: 10.7759/cureus.55869.
  • Y. Mao et al., “Machine learning algorithms for heart disease diagnosis: A systematic review,” Current Problems in Cardiology, vol. 50, no. 8, Aug. 2025, Art. no. 103082, doi: 10.1016/j.cpcardiol.2025.103082.
  • K. Fujiwara, “Knowledge distillation with resampling for imbalanced data classification: Enhancing predictive performance and explainability stability,” Results in Engineering, vol. 24, Dec. 2024, Art. no. 103406, doi: 10.1016/j.rineng.2024.103406.
  • S. Matharaarachchi, “Enhancing SMOTE for imbalanced data with abnormal minority instances,” Machine Learning with Applications, vol. 18, Dec. 2024, Art. no. 100597, doi: 10.1016/j.mlwa.2024.100597.
  • Q. An, “Ensemble learning method for classification: Integrating data envelopment analysis with machine learning,” Computers & Operations Research, vol. 169, Sep. 2024, Art. no. 106739, doi: 10.1016/j.cor.2024.106739.
  • D. B. Olawade et al., “Comparative analysis of machine learning models for coronary artery disease prediction with optimized feature selection,” International Journal of Cardiology, vol. 436, May. 2025, Art. no. 133443, doi: 10.1016/j.ijcard.2025.133443.
  • Oktay Rdeki, “Heart Disease Dataset,” Kaggle, [Online]. Available:https://www.kaggle.com/datasets/oktayrdeki/heart-disease. [Accessed: Jul. 19, 2025].
  • A. Kusiak, “Data mining and decision making,” Proceedings of SPIE - The International Society for Optical Engineering, vol. 4730, pp. 155-165, May. 2002. [Online]. Available: https://www.researchgate.net/publication/228697239_Data_mining_and_decision_making. [Accessed: Jul. 19, 2025].
  • M. H. Kabir. “Study on the Performance of Classification Algorithms for Data Mining,” IOSR Journal of Computer Engineering (IOSR-JCE), vol. 21, no. 3, pp. 23-30, Jul. 2019, doi: 10.9790/0661-2103062330.
  • A. S. M. S. Hassan et al., “Study on the Performance of Supervised Machine Learning Algorithms in Mobile Price Range Classification,” International Journal of Computer Applications, vol. 187, no. 1, pp. 39-45, May. 2025, doi: 10.5120/ijca2025924768.
  • S. Mohan, et al., “Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques,” IEEE Access, vol. 7, pp. 81542-81554, Jun. 2019, doi: 10.1109/ACCESS.2019.2923707.
  • M. S. Alom et al., “Enhanced Heart Disease Prediction Using Advanced Classifiers and Ensemble Learning Techniques,” in 2025 International Conference on Electrical, Computer and Communication Engineering (ECCE), Chattogram, Bangladesh, Feb. 13-15, 2025, doi: 10.1109/ECCE64574.2025.11013799.
  • D. Elreedy et al., “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Machine Learning, vol. 113, pp. 4903-4923, Jan. 2023, doi:10.1007/s10994-022-06296-4.
  • W. Baddah et al., “Optimizing Heart Disease Prediction Models through SMOTE: Addressing Data Imbalance,” in 2024 4th International Conference on Emerging Smart Technologies and Applications (eSmarTA), Sana'a, Yemen, Aug. 6-7, 2024, doi: 10.1109/eSmarTA62850.2024.10638899.
  • B. Warner et al., “Ensemble Learning with Highly Variable Class-Based Performance,” Machine Learning and Knowledge Extraction, vol. 6, no. 4, pp. 2149-2160, Sep. 2024, doi: 10.3390/make6040106.
  • O. Hrizi et al., “Federated and ensemble learning framework with optimized feature selection for heart disease detection [J],” AIMS Mathematics, vol. 10, no. 3, pp. 7290-7318, Mar. 2025, doi: 10.3934/math.2025334.
  • K. Lakshmanan and P. Gomathi, “Multi-cascaded heart disease prediction using hybrid deep learning and optimization techniques,” Computer Methods in Biomechanics and Biomedical Engineering, vol. 1, pp. 1–36, Jul. 2025, doi: 10.1080/10255842.2025.2525981.
  • L. Ren et al., “A review on missing values for main challenges and methods,” Information Systems, vol. 119, Oct. 2023, Art. no. 102268, doi: 10.1016/j.is.2023.102268.
  • M. Alwateer et al., “Missing Data Imputation: A Comprehensive Review,” Journal of Computer and Communications, vol. 12, no. 11, pp. 53-75, Nov 2024, doi: 10.4236/jcc.2024.1211004.
  • F. Bolikulov et al., “Effective Methods of Categorical Data Encoding for Artificial Intelligence Algorithms,” Mathematics, vol. 12, no. 16, Aug. 2024, Art. no. 2553, doi:10.3390/math12162553.
  • J. M. H. Pinheiro et al., “The impact of feature scaling in machine learning: Effects on regression and classification tasks,” arXiv preprint arXiv:2506.08274v3, Jul. 2025. [Online]. Available: https://arxiv.org/abs/2506.08274.
  • U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 4, pp. 1060-1073, Apr. 2022, doi: 10.1016/j.jksuci.2019.06.012.
  • Mahesh T R et al., “The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification,” Healthcare Analytics, vol. 4, Dec. 2023, Art. no. 100247, doi: 10.1016/j.health.2023.100247.
  • J. Wu et al., “Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization,” Journal of Electronic Science and Technology, vol. 17, pp. 26-40, Mar. 2019, doi: 10.11989/JEST.1674-862X.80904120.
  • T. G. Nick, and K. M. Campbell, “Logistic Regression,” Methods in molecular biology (Clifton, N.J.), vol. 404, pp. 273-301, Feb. 2007, doi: 10.1007/978-1-59745-530-5_14.
  • N. S. Altman, “An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, Feb. 2012, doi: 10.1080/00031305.1992.10475879.
  • J.R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81–106, Mar 1986, doi: 10.1007/BF00116251.
  • L. Breiman, “Random Forests,” Machine Learning, vol. 45, pp. 5–32, Oct 2001, doi: 10.1023/A:1010933404324.
  • C. Cortes, and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018.
  • F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, no. 85, pp. 2825−2830, Nov 2011. [Online]. Available: http://jmlr.org/papers/v12/pedregosa11a.html.
  • D. M. W. Powers, “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation,” arXiv preprint arXiv: arXiv: 2010.16061v1, Oct. 2020. [Online]. Available: https://arxiv.org/abs/2010.16061.
  • M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Information Processing & Management, vol. 45, no. 4, pp. 427-437, Jul. 2009, doi: 10.1016/j.ipm.2009.03.002.
  • P. Bradley, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, Jul. 1997, doi: 10.1016/S0031-3203(96)00142-2.
  • L. Rokach, “Ensemble-based classifiers,” Artificial Intelligence Review, vol. 33, pp. 1–39, Nov. 2010, doi: 10.1007/s10462-009-9124-7.
  • G. Seni and J. F. Elder, “Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions,” Morgan & Claypool, vol. 2, no. 1, Jan. 2010, doi: 10.2200/S00240ED1V01Y201001DMK002.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Heart Disease Diagnosis Classification Class Imbalance Ensemble Learning Decision Making

Powered by PhDFocusTM