An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique

Emmanuel O. Oshoiribhor; Adetokunbo M. John-Otumu

Research Article

An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique

by Emmanuel O. Oshoiribhor, Adetokunbo M. John-Otumu

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Issue 31

Published: August 2025

Authors: Emmanuel O. Oshoiribhor, Adetokunbo M. John-Otumu

10.5120/ijca2025925542

PDF

Emmanuel O. Oshoiribhor, Adetokunbo M. John-Otumu . An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique. International Journal of Computer Applications. 187, 31 (August 2025), 26-35. DOI=10.5120/ijca2025925542

                        @article{ 10.5120/ijca2025925542,
                        author  = { Emmanuel O. Oshoiribhor,Adetokunbo M. John-Otumu },
                        title   = { An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 31 },
                        pages   = { 26-35 },
                        doi     = { 10.5120/ijca2025925542 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2025
                        %A Emmanuel O. Oshoiribhor
                        %A Adetokunbo M. John-Otumu
                        %T An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 31
                        %P 26-35
                        %R 10.5120/ijca2025925542
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

This study aims to improve diabetes prediction by integrating Random Forest classifiers with Explainable AI (XAI) methods such as LIME to enhance model interpretability and clinical trust. Using the “diabetes.csv” dataset from Kaggle (768 records with nine clinical features), the research addresses challenges posed by its imbalanced distribution of 500 non-diabetic and 268 diabetic cases. Baseline evaluations showed accuracies of 70% for SVM and 72.07% for Random Forest, with similar precision, recall, F1-scores, and ROC AUC values around 0.81. Applying Random Search for hyperparameter tuning improved Random Forest performance to 75% accuracy, 64% precision, 69% recall, 67% F1-score, and 0.83 ROC AUC. To assess robustness and generalization, a Text-Guided Synthetic Dataset (synthetic_diabetes_data.csv, 35 KB) was generated using ChatGPT, containing 1000 instances (450 non-diabetes, 550 diabetes) with real, integer, and categorical features based on prompt design. Testing on this balanced, diverse dataset yielded higher performance: 93.5% accuracy, 92% precision, 94% recall, 93% F1-score, and 0.95 ROC AUC. LIME explanations provided clear, case-specific insights, aiding clinician understanding and supporting trustworthy decision-making. Human-centered evaluations rated these explanations highly for plausibility, clarity, and clinical usefulness. Despite challenges from data imbalance in real-world settings, the study demonstrates that combining machine learning with explainable AI offers an effective, transparent approach for early diabetes prediction, while highlighting the need for high-quality, diverse datasets to ensure reliable deployment in clinical practice.

References

Aslan, M. F., & Sabanci, K. (2023). A Novel Proposal for Deep Learning-Based Diabetes Prediction: Converting Clinical Data to Image Data. Diagnostics, 13(4), 796. https://doi.org/10.3390/diagnostics13040796
Patel, D. P. J., Shah, P., Nayak, R., Shukla, H., Limbad, N., & Kukadiya, M. A. (2024). AN ASSESSMENT OF MACHINE LEARNING AND ENSEMBLE MODELS FOR DIABETES PREDICTION.
Thotad, P. N., Bharamagoudar, G. R., & Kallur, S. S. (2023). Boosting-based machine learning approaches for diabetes prediction using Indian demographic and health survey-2021 data. https://doi.org/10.21203/rs.3.rs-2784266/v1
Yahyaoui, A., Jamil, A., Rasheed, J., & Yesiltepe, M. (2019). A Decision Support System for Diabetes Prediction Using Machine Learning and Deep Learning Techniques. 2019 1st International Informatics and Software Engineering Conference (UBMYK), 1–4. https://doi.org/10.1109/UBMYK48245.2019.8965556
American Diabetes Association (ADA). (2020). 2. Classification and diagnosis of diabetes: Standards of medical care in diabetes—2020. Diabetes Care, 43(Supplement 1), S14-S31. https://doi.org/10.2337/dc20-S002
International Diabetes Federation (IDF). (2019). IDF Diabetes Atlas (9th ed.). International Diabetes Federation. https://diabetesatlas.org/en/
Ahamed, B. S., Arya, M. S., & Nancy, A. O. V. (2022). Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation. Advances in Human-Computer Interaction, 2022, 1–14. https://doi.org/10.1155/2022/9220560
Mishra, S. K., & Tiwari, A. K. (2021). Deep Learning Techniques for the Prediction of Diabetes: A Review: Proceedings of the 3rd International Conference on Advanced Computing and Software Engineering, 232–237. https://doi.org/10.5220/0010567400003161
Rhee, S. Y., Sung, J. M., Kim, S., Cho, I.-J., Lee, S.-E., & Chang, H.-J. (2021). Development and Validation of a Deep Learning Based Diabetes Prediction System Using a Nationwide Population-Based Cohort. Diabetes & Metabolism Journal, 45(4), 515–525. https://doi.org/10.4093/dmj.2020.0081
Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220, 106773. https://doi.org/10.1016/j.cmpb.2022.106773
Priya G., and Pandi, G. S (2022). Diabetes prediction using machine learning techniques. International Journal of Research Publication and Reviews, 3(2), 77-82
Okolo, C. (2022). Diabetes Prediction Using Machine Learning Algorithm. https://doi.org/10.13140/RG.2.2.25215.18084/2
Suman, S. K., Saikia, U., Chauhan, R., & Sharma, N. (2023). Diabetes Prediction using Machine Learning. 8(11).
Tasin, I., Nabil, T. U., Islam, S., & Khan, R. (2023). Diabetes prediction using machine learning and explainable AI techniques. Healthcare Technology Letters, 10(1–2), 1–10. https://doi.org/10.1049/htl2.12039
World Health Organization (WHO). (2019). Diabetes. https://www.who.int/news-room/fact-sheets/detail/diabetes
Noori, N. A., & Yassin, A. A. (2021). A Comparative Analysis for Diabetic Prediction Based on Machine Learning Techniques. 1.
Zhan, W. (2022). A Comparative Study on Machine Learning Based Type 2 Diabetes Mellitus Prediction: 2022 International Conference on Computer Science, Information Engineering and Digital Economy (CSIEDE 2022), Guangzhou, China. https://doi.org/10.2991/978-94-6463-108-1_95
Deo, R., & Panigrahi, S. (2019). Performance Assessment of Machine Learning Based Models for Diabetes Prediction. 2019 IEEE Healthcare Innovations and Point of Care Technologies, (HI-POCT), 147–150. https://doi.org/10.1109/HI-POCT45284.2019.8962811
Ibitoye, A. O. J., Akinyemi, J. D., & Onifade, O. F. W. (2024). Machine Learning-Based Diabetes Risk Prediction Using Associated Behavioral Features. Computing Open, 02, 2450006. https://doi.org/10.1142/S2972370124500065
Ogbera, A. O., & Ekpebegh, C. (2014). Diabetes mellitus in Nigeria: The past, present and future. World Journal of Diabetes, 5(6), 905-911. https://doi.org/10.4239/wjd.v5.i6.905
Mbanya, J. C., Motala, A. A., Sobngwi, E., Assah, F. K., & Enoru, S. T. (2010). Diabetes in sub-Saharan Africa. The Lancet, 375(9733), 2254-2266. https://doi.org/10.1016/S0140-6736(10)60550-8
Adeniyi, O. V., Yogeswaran, P., Longo-Mbenza, B., Goon, D. T., & Ajayi, A. I. (2015). Uncontrolled hypertension and its determinants in patients with concomitant type 2 diabetes mellitus (T2DM) in rural South Africa. PLoS ONE, 10(3), e0118636. https://doi.org/10.1371/journal.pone.0118636
Rotimi, C. N., Dunston, G. M., Berg, K., Adegoke, O., Amoah, A., Owusu, S., ... & Chen, G. (1999). In search of susceptibility genes for type 2 diabetes in West Africa: The design and results of the first phase of the African American Diabetes Mellitus (AADM) Study. Diabetes Care, 22(2), 340-342. https://doi.org/10.2337/diacare.22.2.340
Osei, K., Schuster, D. P., Amoah, A. G., & Owusu, S. K. (2003). Diabetes in Africa: Pathogenesis of type 1 and type 2 diabetes mellitus in sub-Saharan Africa. The Lancet Diabetes & Endocrinology, 1(3), 225-237. https://doi.org/10.1016/S2213-8587(13)70115-8
Adeloye, D., Ige, J. O., Aderemi, A. V., Adeleye, N., Amoo, E. O., Auta, A., & Oni, G. (2017). Estimating the prevalence, hospitalisation and mortality from type 2 diabetes mellitus in Nigeria: a systematic review and meta-analysis. BMJ Open, 7(5), e015424. https://doi.org/10.1136/bmjopen-2016-015424
Fasanmade, O. A., & Dagogo-Jack, S. (2015). Diabetes care in Nigeria. Annals of Global Health, 81(6), 821-829. https://doi.org/10.1016/j.aogh.2015.12.012
India., & Sistla, S. (2022). Predicting Diabetes using SVM Implemented by Machine Learning. International Journal of Soft Computing and Engineering, 12(2), 16–18. https://doi.org/10.35940/ijsce.B3557.0512222
Sharma, A., & Sharma, K. (2019). Machine learning and artificial intelligence applications in healthcare. In Deep Learning and Artificial Intelligence for Healthcare Applications, (pp. 1-24). Springer. https://doi.org/10.1007/978-3-030-30516-3_1
Hasan, Md. K., Alam, Md. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers. IEEE Access, 8, 76516–76531. https://doi.org/10.1109/ACCESS.2020.2989857
Sonar, P., & JayaMalini, K. (2019). Diabetes Prediction Using Different Machine Learning Approaches. 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), 367–371. https://doi.org/10.1109/ICCMC.2019.8819841
Arrieta, A. B., Díaz-Rodríguez, N., Ser, J. D., Bennetot, A., Tabik, S., Barbado, A., ... & Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82-115. https://doi.org/10.1016/j.inffus.2019.12.012
Ahmed, N., Ahammed, R., Islam, Md. M., Uddin, Md. A., Akhter, A., Talukder, Md. A., & Paul, B. K. (2021). Machine learning based diabetes prediction and development of smart web application. International Journal of Cognitive Computing in Engineering, 2, 229–241. https://doi.org/10.1016/j.ijcce.2021.12.001
Kalange, O., Katale, T., Kale, A., Kahat, R., & Sayyed, J. (2022). Prediction of diabetes using R. International Journal of Advances in Engineering and Management (IJAEM), 4(12), 885–890. https://doi.org/10.35629/5252-0412885890
Bandhu, K. C., Litoriya, R., Rathore, A., Safdari, A., Watt, A., Vaidya, S., & Khan, M. A. (2023). Integrating Machine Learning for Accurate Prediction of Early Diabetes: A Novel Approach. International Journal of Cyber Behavior, Psychology and Learning, 13(1), 1–24. https://doi.org/10.4018/IJCBPL.333157
Rahman, Md. A., Abdulrazak, L. F., Ali, Md. M., Mahmud, I., Ahmed, K., & Bui, F. M. (2023). Machine Learning-Based Approach for Predicting Diabetes Employing Socio-Demographic Characteristics. Algorithms, 16(11), 503. https://doi.org/10.3390/a16110503
Shekhar, S., & Thakur, D. N. (2023). Machine Learning Based Diabetes Prediction System: A Novel Approach.
Soni, M., & Varma, D. S. (2020). Diabetes Prediction using Machine Learning Techniques. International Journal of Engineering Research, 9(09).
Massari, H. E., Sabouri, Z., Mhammedi, S., & Gherabi, N. (2022). Diabetes Prediction Using Machine Learning Algorithms and Ontology. Journal of ICT Standardization. https://doi.org/10.13052/jicts2245-800X.10212
Kopitar, L., Kocbek, P., Cilar, L., Sheikh, A., & Stiglic, G. (2020). Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Scientific Reports, 10(1), 11981. https://doi.org/10.1038/s41598-020-68771-z
Ejiga Peter, O. O., Adeniran, O. T., John-Otumu, A. M., Khalifa, F., Rahman, M. M. (2025). Text-Guided Synthesis in Medical Multimedia Retrieval: A Framework for Enhanced Colonoscopy Image Classification and Segmentation. Algorithms, 18, 155. https://doi.org/10.3390/a18030155

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Diabetes Prediction Random Forest LIME Model Interpretability Explainable AI Hyperparameter Optimization Healthcare Analytics