DiaBD: A novel benchmark dataset for diabetes prediction
  • Islam, Tanvir
  • Miah, Abu Saleh Musa
  • Raihan, M.
  • Kabir, Mohammad Iqbal
  • Bijoy, Mehadi Hasan
  • ... Muhammad, Khan
  • 외 4명
Citations

WEB OF SCIENCE

1
Citations

SCOPUS

0

초록

Diabetes mellitus (DM) is a global health challenge that requires efficient and accurate prediction models for its early diagnosis and management. This research introduces a novel benchmark dataset named "Diabetes in Bangladesh (DiaBD)," created by integrating and optimizing attributes from two real-world datasets curated from southern Bangladesh. The first classifies diabetes symptoms as typical, non-typical, or no symptoms, whereas the second is for binary classification of diabetes. By merging and optimizing their shared attributes, "DiaBD" offers a comprehensive foundation with 17 essential features of 738 unique diabetes and non-diabetes instances. This makes it suitable for developing and benchmarking machine learning (ML) and deep learning-based predictive models. This study also employs a unique pipeline with majority voting-based feature selection and leverages the dataset to establish a stacked convolutional neural network (Stack-CNN)-based method for diabetes prediction. A comparative evaluation of state-of-the-art ML, ensemble learning, and neural network approaches was performed, revealing that the proposed model significantly outperformed traditional ML and ensemble techniques. Specifically, the model achieved outstanding results on multiple datasets, with an accuracy of 98.09% for diabetes symptom prediction and 98.15% for diabetes prediction. Additionally, the proposed model was validated on an external benchmark dataset along with generalizability of the feature selection process, exhibiting consistent performance improvements. This emphasizes the robustness of "DiaBD"and efficacy of the proposed methodologies. The findings establish "DiaBD" as a reliable dataset that provides a foundation for future research in intelligent diabetes symptom classification, binary prediction, and precision medicine, thereby facilitating smart healthcare management and early diabetes intervention.

키워드

Diabetes predictionDiabetes mellitusSmart healthcareDiabetes symptomsMachine learningArtificial neural networksTYPE-2
제목
DiaBD: A novel benchmark dataset for diabetes prediction
저자
Islam, TanvirMiah, Abu Saleh MusaRaihan, M.Kabir, Mohammad IqbalBijoy, Mehadi HasanBairagi, Anupam KumarSaudagar, Abdul Khader JilaniAlkhrijah, Yazeed MasaudLee, Ik HyunMuhammad, Khan
DOI
10.1016/j.aej.2025.08.017
발행일
2025-11
유형
Article
저널명
AEJ - Alexandria Engineering Journal
132
페이지
435 ~ 455