Efficient Diabetes Prediction Using Random Forests and Minimal Health Indicators on the BRFSS Dataset
DOI:
https://doi.org/10.5281/zenodo.16234622Anahtar Kelimeler:
Diabetes Prediction- Random Forest- Feature Selection- BRFSS Dataset- Healthcare AnalyticsÖz
Early detection of diabetes is crucial for public health systems to implement timely interventions. In this study, we utilize the 2015 Behavioral Risk Factor Surveillance System (BRFSS) dataset, particularly its balanced binary version, to build a Random Forest classifier for diabetes prediction. We begin with all 21 features and iteratively compare default, weighted, and hyperparameter-tuned models. Subsequently, we apply feature importance analysis to isolate the most significant predictors and retrain the model with a reduced feature set. Our tuned Random Forest model achieved an F1-score of 0.762 using all features. Notably, using only four features (GenHlth, HighBP, BMI, and Age), the model still achieved a robust F1-score of 0.751. These findings suggest that simpler models using fewer but high-impact features can be effectively deployed for diabetes prediction without sacrificing performance.
İndirmeler
İndir
Yayınlanmış
Sayı
Bölüm
Lisans
Telif Hakkı (c) 2025 Adnan Kutay Yüksel, Mehmet Serdar Güzel

Bu çalışma Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License ile lisanslanmıştır.