Research Article

An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique

by  Emmanuel O. Oshoiribhor, Adetokunbo M. John-Otumu
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 31
Published: August 2025
Authors: Emmanuel O. Oshoiribhor, Adetokunbo M. John-Otumu
10.5120/ijca2025925542
PDF

Emmanuel O. Oshoiribhor, Adetokunbo M. John-Otumu . An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique. International Journal of Computer Applications. 187, 31 (August 2025), 26-35. DOI=10.5120/ijca2025925542

                        @article{ 10.5120/ijca2025925542,
                        author  = { Emmanuel O. Oshoiribhor,Adetokunbo M. John-Otumu },
                        title   = { An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 31 },
                        pages   = { 26-35 },
                        doi     = { 10.5120/ijca2025925542 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2025
                        %A Emmanuel O. Oshoiribhor
                        %A Adetokunbo M. John-Otumu
                        %T An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 31
                        %P 26-35
                        %R 10.5120/ijca2025925542
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

This study aims to improve diabetes prediction by integrating Random Forest classifiers with Explainable AI (XAI) methods such as LIME to enhance model interpretability and clinical trust. Using the “diabetes.csv” dataset from Kaggle (768 records with nine clinical features), the research addresses challenges posed by its imbalanced distribution of 500 non-diabetic and 268 diabetic cases. Baseline evaluations showed accuracies of 70% for SVM and 72.07% for Random Forest, with similar precision, recall, F1-scores, and ROC AUC values around 0.81. Applying Random Search for hyperparameter tuning improved Random Forest performance to 75% accuracy, 64% precision, 69% recall, 67% F1-score, and 0.83 ROC AUC. To assess robustness and generalization, a Text-Guided Synthetic Dataset (synthetic_diabetes_data.csv, 35 KB) was generated using ChatGPT, containing 1000 instances (450 non-diabetes, 550 diabetes) with real, integer, and categorical features based on prompt design. Testing on this balanced, diverse dataset yielded higher performance: 93.5% accuracy, 92% precision, 94% recall, 93% F1-score, and 0.95 ROC AUC. LIME explanations provided clear, case-specific insights, aiding clinician understanding and supporting trustworthy decision-making. Human-centered evaluations rated these explanations highly for plausibility, clarity, and clinical usefulness. Despite challenges from data imbalance in real-world settings, the study demonstrates that combining machine learning with explainable AI offers an effective, transparent approach for early diabetes prediction, while highlighting the need for high-quality, diverse datasets to ensure reliable deployment in clinical practice.

References
  • Aslan, M. F., & Sabanci, K. (2023). A Novel Proposal for Deep Learning-Based Diabetes Prediction: Converting Clinical Data to Image Data. Diagnostics, 13(4), 796. https://doi.org/10.3390/diagnostics13040796
  • Patel, D. P. J., Shah, P., Nayak, R., Shukla, H., Limbad, N., & Kukadiya, M. A. (2024). AN ASSESSMENT OF MACHINE LEARNING AND ENSEMBLE MODELS FOR DIABETES PREDICTION.
  • Thotad, P. N., Bharamagoudar, G. R., & Kallur, S. S. (2023). Boosting-based machine learning approaches for diabetes prediction using Indian demographic and health survey-2021 data. https://doi.org/10.21203/rs.3.rs-2784266/v1
  • Yahyaoui, A., Jamil, A., Rasheed, J., & Yesiltepe, M. (2019). A Decision Support System for Diabetes Prediction Using Machine Learning and Deep Learning Techniques. 2019 1st International Informatics and Software Engineering Conference (UBMYK), 1–4. https://doi.org/10.1109/UBMYK48245.2019.8965556
  • American Diabetes Association (ADA). (2020). 2. Classification and diagnosis of diabetes: Standards of medical care in diabetes—2020. Diabetes Care, 43(Supplement 1), S14-S31. https://doi.org/10.2337/dc20-S002
  • International Diabetes Federation (IDF). (2019). IDF Diabetes Atlas (9th ed.). International Diabetes Federation. https://diabetesatlas.org/en/
  • Ahamed, B. S., Arya, M. S., & Nancy, A. O. V. (2022). Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation. Advances in Human-Computer Interaction, 2022, 1–14. https://doi.org/10.1155/2022/9220560
  • Mishra, S. K., & Tiwari, A. K. (2021). Deep Learning Techniques for the Prediction of Diabetes: A Review: Proceedings of the 3rd International Conference on Advanced Computing and Software Engineering, 232–237. https://doi.org/10.5220/0010567400003161
  • Rhee, S. Y., Sung, J. M., Kim, S., Cho, I.-J., Lee, S.-E., & Chang, H.-J. (2021). Development and Validation of a Deep Learning Based Diabetes Prediction System Using a Nationwide Population-Based Cohort. Diabetes & Metabolism Journal, 45(4), 515–525. https://doi.org/10.4093/dmj.2020.0081
  • Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220, 106773. https://doi.org/10.1016/j.cmpb.2022.106773
  • Priya G., and Pandi, G. S (2022). Diabetes prediction using machine learning techniques. International Journal of Research Publication and Reviews, 3(2), 77-82
  • Okolo, C. (2022). Diabetes Prediction Using Machine Learning Algorithm. https://doi.org/10.13140/RG.2.2.25215.18084/2
  • Suman, S. K., Saikia, U., Chauhan, R., & Sharma, N. (2023). Diabetes Prediction using Machine Learning. 8(11).
  • Tasin, I., Nabil, T. U., Islam, S., & Khan, R. (2023). Diabetes prediction using machine learning and explainable AI techniques. Healthcare Technology Letters, 10(1–2), 1–10. https://doi.org/10.1049/htl2.12039
  • World Health Organization (WHO). (2019). Diabetes. https://www.who.int/news-room/fact-sheets/detail/diabetes
  • Noori, N. A., & Yassin, A. A. (2021). A Comparative Analysis for Diabetic Prediction Based on Machine Learning Techniques. 1.
  • Zhan, W. (2022). A Comparative Study on Machine Learning Based Type 2 Diabetes Mellitus Prediction: 2022 International Conference on Computer Science, Information Engineering and Digital Economy (CSIEDE 2022), Guangzhou, China. https://doi.org/10.2991/978-94-6463-108-1_95
  • Deo, R., & Panigrahi, S. (2019). Performance Assessment of Machine Learning Based Models for Diabetes Prediction. 2019 IEEE Healthcare Innovations and Point of Care Technologies, (HI-POCT), 147–150. https://doi.org/10.1109/HI-POCT45284.2019.8962811
  • Ibitoye, A. O. J., Akinyemi, J. D., & Onifade, O. F. W. (2024). Machine Learning-Based Diabetes Risk Prediction Using Associated Behavioral Features. Computing Open, 02, 2450006. https://doi.org/10.1142/S2972370124500065
  • Ogbera, A. O., & Ekpebegh, C. (2014). Diabetes mellitus in Nigeria: The past, present and future. World Journal of Diabetes, 5(6), 905-911. https://doi.org/10.4239/wjd.v5.i6.905
  • Mbanya, J. C., Motala, A. A., Sobngwi, E., Assah, F. K., & Enoru, S. T. (2010). Diabetes in sub-Saharan Africa. The Lancet, 375(9733), 2254-2266. https://doi.org/10.1016/S0140-6736(10)60550-8
  • Adeniyi, O. V., Yogeswaran, P., Longo-Mbenza, B., Goon, D. T., & Ajayi, A. I. (2015). Uncontrolled hypertension and its determinants in patients with concomitant type 2 diabetes mellitus (T2DM) in rural South Africa. PLoS ONE, 10(3), e0118636. https://doi.org/10.1371/journal.pone.0118636
  • Rotimi, C. N., Dunston, G. M., Berg, K., Adegoke, O., Amoah, A., Owusu, S., ... & Chen, G. (1999). In search of susceptibility genes for type 2 diabetes in West Africa: The design and results of the first phase of the African American Diabetes Mellitus (AADM) Study. Diabetes Care, 22(2), 340-342. https://doi.org/10.2337/diacare.22.2.340
  • Osei, K., Schuster, D. P., Amoah, A. G., & Owusu, S. K. (2003). Diabetes in Africa: Pathogenesis of type 1 and type 2 diabetes mellitus in sub-Saharan Africa. The Lancet Diabetes & Endocrinology, 1(3), 225-237. https://doi.org/10.1016/S2213-8587(13)70115-8
  • Adeloye, D., Ige, J. O., Aderemi, A. V., Adeleye, N., Amoo, E. O., Auta, A., & Oni, G. (2017). Estimating the prevalence, hospitalisation and mortality from type 2 diabetes mellitus in Nigeria: a systematic review and meta-analysis. BMJ Open, 7(5), e015424. https://doi.org/10.1136/bmjopen-2016-015424
  • Fasanmade, O. A., & Dagogo-Jack, S. (2015). Diabetes care in Nigeria. Annals of Global Health, 81(6), 821-829. https://doi.org/10.1016/j.aogh.2015.12.012
  • India., & Sistla, S. (2022). Predicting Diabetes using SVM Implemented by Machine Learning. International Journal of Soft Computing and Engineering, 12(2), 16–18. https://doi.org/10.35940/ijsce.B3557.0512222
  • Sharma, A., & Sharma, K. (2019). Machine learning and artificial intelligence applications in healthcare. In Deep Learning and Artificial Intelligence for Healthcare Applications, (pp. 1-24). Springer. https://doi.org/10.1007/978-3-030-30516-3_1
  • Hasan, Md. K., Alam, Md. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers. IEEE Access, 8, 76516–76531. https://doi.org/10.1109/ACCESS.2020.2989857
  • Sonar, P., & JayaMalini, K. (2019). Diabetes Prediction Using Different Machine Learning Approaches. 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), 367–371. https://doi.org/10.1109/ICCMC.2019.8819841
  • Arrieta, A. B., Díaz-Rodríguez, N., Ser, J. D., Bennetot, A., Tabik, S., Barbado, A., ... & Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82-115. https://doi.org/10.1016/j.inffus.2019.12.012
  • Ahmed, N., Ahammed, R., Islam, Md. M., Uddin, Md. A., Akhter, A., Talukder, Md. A., & Paul, B. K. (2021). Machine learning based diabetes prediction and development of smart web application. International Journal of Cognitive Computing in Engineering, 2, 229–241. https://doi.org/10.1016/j.ijcce.2021.12.001
  • Kalange, O., Katale, T., Kale, A., Kahat, R., & Sayyed, J. (2022). Prediction of diabetes using R. International Journal of Advances in Engineering and Management (IJAEM), 4(12), 885–890. https://doi.org/10.35629/5252-0412885890
  • Bandhu, K. C., Litoriya, R., Rathore, A., Safdari, A., Watt, A., Vaidya, S., & Khan, M. A. (2023). Integrating Machine Learning for Accurate Prediction of Early Diabetes: A Novel Approach. International Journal of Cyber Behavior, Psychology and Learning, 13(1), 1–24. https://doi.org/10.4018/IJCBPL.333157
  • Rahman, Md. A., Abdulrazak, L. F., Ali, Md. M., Mahmud, I., Ahmed, K., & Bui, F. M. (2023). Machine Learning-Based Approach for Predicting Diabetes Employing Socio-Demographic Characteristics. Algorithms, 16(11), 503. https://doi.org/10.3390/a16110503
  • Shekhar, S., & Thakur, D. N. (2023). Machine Learning Based Diabetes Prediction System: A Novel Approach.
  • Soni, M., & Varma, D. S. (2020). Diabetes Prediction using Machine Learning Techniques. International Journal of Engineering Research, 9(09).
  • Massari, H. E., Sabouri, Z., Mhammedi, S., & Gherabi, N. (2022). Diabetes Prediction Using Machine Learning Algorithms and Ontology. Journal of ICT Standardization. https://doi.org/10.13052/jicts2245-800X.10212
  • Kopitar, L., Kocbek, P., Cilar, L., Sheikh, A., & Stiglic, G. (2020). Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Scientific Reports, 10(1), 11981. https://doi.org/10.1038/s41598-020-68771-z
  • Ejiga Peter, O. O., Adeniran, O. T., John-Otumu, A. M., Khalifa, F., Rahman, M. M. (2025). Text-Guided Synthesis in Medical Multimedia Retrieval: A Framework for Enhanced Colonoscopy Image Classification and Segmentation. Algorithms, 18, 155. https://doi.org/10.3390/a18030155
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Diabetes Prediction Random Forest LIME Model Interpretability Explainable AI Hyperparameter Optimization Healthcare Analytics

Powered by PhDFocusTM