Efficient Diabetes Prediction Using Random Forests and Minimal Health Indicators on the BRFSS Dataset

Yazarlar

DOI:

https://doi.org/10.5281/zenodo.16234622

Anahtar Kelimeler:

Diabetes Prediction- Random Forest- Feature Selection- BRFSS Dataset- Healthcare Analytics

Öz

Early detection of diabetes is crucial for public health systems to implement timely interventions. In this study, we utilize the 2015 Behavioral Risk Factor Surveillance System (BRFSS) dataset, particularly its balanced binary version, to build a Random Forest classifier for diabetes prediction. We begin with all 21 features and iteratively compare default, weighted, and hyperparameter-tuned models. Subsequently, we apply feature importance analysis to isolate the most significant predictors and retrain the model with a reduced feature set. Our tuned Random Forest model achieved an F1-score of 0.762 using all features. Notably, using only four features (GenHlth, HighBP, BMI, and Age), the model still achieved a robust F1-score of 0.751. These findings suggest that simpler models using fewer but high-impact features can be effectively deployed for diabetes prediction without sacrificing performance.

İndirmeler

İndirme verisi henüz mevcut değil.

Yazar Biyografileri

  • Adnan Kutay Yüksel, Ankara Üniversitesi

    Doktora Öğrencisi, Bilgisayar Mühendisliği Bölümü, Ankara Üniversitesi

  • Mehmet Serdar Güzel, Ankara Üniversitesi

    Profesör, Bilgisayar Mühendisliği Bölümü, Ankara Üniversitesi

Yayınlanmış

2025-07-22

Sayı

Bölüm

Articles

Nasıl Atıf Yapılır

Efficient Diabetes Prediction Using Random Forests and Minimal Health Indicators on the BRFSS Dataset. (2025). Yapay Zeka Ve İnsan Bilimleri Dergisi, 2(1), 34-43. https://doi.org/10.5281/zenodo.16234622