REMED-T2D: A robust ensemble learning model for early detection of type 2 diabetes using healthcare dataset
- Authors
- Phan, Le Thi; Rakkiyappan, Rajan; Manavalan, Balachandran
- Issue Date
- Mar-2025
- Publisher
- Elsevier Ltd
- Keywords
- Diabetes; Ensemble learning; Machine learning; Pima indian diabetes; Random sampling technique
- Citation
- Computers in Biology and Medicine, v.187
- Indexed
- SCIE
SCOPUS
- Journal Title
- Computers in Biology and Medicine
- Volume
- 187
- URI
- https://scholarx.skku.edu/handle/2021.sw.skku/120479
- DOI
- 10.1016/j.compbiomed.2025.109771
- ISSN
- 0010-4825
1879-0534
- Abstract
- Early diagnosis and timely treatment of diabetes are critical for effective disease management and the prevention of complications. Undiagnosed diabetes can lead to an increased risk of several health issues. Although numerous machine learning (ML) models have been designed to detect diabetes, many exhibit unsatisfactory performance, are not publicly available, and lack validation on external datasets. To address these limitations, we have developed REMED-T2D, an advanced ensemble ML approach that enhances predictive accuracy and robustness through the integration of diverse ML algorithms. Our approach involves a rigorous data preprocessing process and systematic evaluation of 20 different algorithms, encompassing both conventional ML and deep learning for diabetes prediction. Firstly, we applied an under-sampling approach to an imbalanced Pima Indian Diabetes dataset and generated five balanced datasets. Using these datasets, we investigated various computational strategies to select the optimal model for accurate diabetes classification. Our results demonstrate that REMED-T2D outperformed state-of-the-art methods on the training dataset, with notable improvements in ACC (1.40–4.60%) and MCC (3.50–9.80%). Extensive external validations revealed that the model trained on a five-feature subset achieved ACC of 92.61 % and 92.26 % on the RTML1 and Pabna datasets, respectively. Moreover, a model based on a seven-feature subset improved ACC by 2.80 % and MCC by 13.27 % on the RTML2 dataset. These results suggest the potential of REMED-T2D to predict diabetes in Asian females. Notably, this is the first study to conduct such a comprehensive analysis using the Pima dataset, incorporating a diverse set of ML algorithms. Furthermore, we have developed a publicly accessible web server (https://balalab-skku.org/REMED-T2D/) to facilitate self-monitoring and timely medical interventions. We believe REMED-T2D will assist healthcare professionals in detecting diabetes earlier and implementing preventive measures, ultimately improving health outcomes for those at risk. © 2025 Elsevier Ltd
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Biotechnology and Bioengineering > Integrative Biotechnology > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.