Efficient Diabetes Prediction Using Random Forests and Minimal Health Indicators on the BRFSS Dataset

Authors

DOI:

https://doi.org/10.5281/zenodo.16234622

Keywords:

Diabetes Prediction, Random Forest, Feature Selection, BRFSS Dataset, Healthcare Analytics

Abstract

Early detection of diabetes is crucial for public health systems to implement timely interventions. In this study, we utilize the 2015 Behavioral Risk Factor Surveillance System (BRFSS) dataset, particularly its balanced binary version, to build a Random Forest classifier for diabetes prediction. We begin with all 21 features and iteratively compare default, weighted, and hyperparameter-tuned models. Subsequently, we apply feature importance analysis to isolate the most significant predictors and retrain the model with a reduced feature set. Our tuned Random Forest model achieved an F1-score of 0.762 using all features. Notably, using only four features (GenHlth, HighBP, BMI, and Age), the model still achieved a robust F1-score of 0.751. These findings suggest that simpler models using fewer but high-impact features can be effectively deployed for diabetes prediction without sacrificing performance.

Downloads

Download data is not yet available.

Author Biographies

  • Adnan Kutay Yüksel, Ankara University

    PhD Student, Department of Computer Engineering, Ankara University

  • Mehmet Serdar Güzel, Ankara University

    Professor, Department of Computer Engineering, Ankara University

Downloads

Published

2025-07-22

Issue

Section

Articles

How to Cite

Efficient Diabetes Prediction Using Random Forests and Minimal Health Indicators on the BRFSS Dataset. (2025). The Journal of Artificial Intelligence and Human Sciences, 2(1), 34-43. https://doi.org/10.5281/zenodo.16234622