International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 187 - Issue 31 |
Published: August 2025 |
Authors: Emmanuel O. Oshoiribhor, Adetokunbo M. John-Otumu |
![]() |
Emmanuel O. Oshoiribhor, Adetokunbo M. John-Otumu . An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique. International Journal of Computer Applications. 187, 31 (August 2025), 26-35. DOI=10.5120/ijca2025925542
@article{ 10.5120/ijca2025925542, author = { Emmanuel O. Oshoiribhor,Adetokunbo M. John-Otumu }, title = { An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique }, journal = { International Journal of Computer Applications }, year = { 2025 }, volume = { 187 }, number = { 31 }, pages = { 26-35 }, doi = { 10.5120/ijca2025925542 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2025 %A Emmanuel O. Oshoiribhor %A Adetokunbo M. John-Otumu %T An Explainable Random Forest Model for Early Diabetes Prediction Using LIME Interpretability Technique%T %J International Journal of Computer Applications %V 187 %N 31 %P 26-35 %R 10.5120/ijca2025925542 %I Foundation of Computer Science (FCS), NY, USA
This study aims to improve diabetes prediction by integrating Random Forest classifiers with Explainable AI (XAI) methods such as LIME to enhance model interpretability and clinical trust. Using the “diabetes.csv” dataset from Kaggle (768 records with nine clinical features), the research addresses challenges posed by its imbalanced distribution of 500 non-diabetic and 268 diabetic cases. Baseline evaluations showed accuracies of 70% for SVM and 72.07% for Random Forest, with similar precision, recall, F1-scores, and ROC AUC values around 0.81. Applying Random Search for hyperparameter tuning improved Random Forest performance to 75% accuracy, 64% precision, 69% recall, 67% F1-score, and 0.83 ROC AUC. To assess robustness and generalization, a Text-Guided Synthetic Dataset (synthetic_diabetes_data.csv, 35 KB) was generated using ChatGPT, containing 1000 instances (450 non-diabetes, 550 diabetes) with real, integer, and categorical features based on prompt design. Testing on this balanced, diverse dataset yielded higher performance: 93.5% accuracy, 92% precision, 94% recall, 93% F1-score, and 0.95 ROC AUC. LIME explanations provided clear, case-specific insights, aiding clinician understanding and supporting trustworthy decision-making. Human-centered evaluations rated these explanations highly for plausibility, clarity, and clinical usefulness. Despite challenges from data imbalance in real-world settings, the study demonstrates that combining machine learning with explainable AI offers an effective, transparent approach for early diabetes prediction, while highlighting the need for high-quality, diverse datasets to ensure reliable deployment in clinical practice.