상세 보기
- Omer, Muhammad;
- Ali, Sardar Jaffar;
- Le, Duc-Tai;
- Choo, Hyunseung
SCOPUS
0초록
Diabetic Retinopathy (DR) classification presents a unique challenge in medical imaging, requiring the detection of fine-grained lesions like microaneurysms within highly imbalanced datasets. While Vision Transformers (ViTs) excel in general computer vision, their efficacy compared to Convolutional Neural Networks (CNNs) in low-data medical regimes remains debated. This study benchmarks three architectural paradigms, selecting nine prominent models developed primarily from around 2020 onwards: pure CNNs (EfficientNetV2, HRNet, InceptionNeXt), pure Transformers (ViT, Swin, DeiT), and Hybrids (CoAtNet, MaxViT, MobileViT), to evaluate the impact of inductive bias on DR severity grading. Using the standardized EyePACS dataset, nine architectures with a consistent pipeline are trained. CNNs proved the most robust, achieving the highest average Quadratic Weighted Kappa (QWK: 0.687) and AUC (0.846), with HRNet emerging as the top model (QWK: 0.70). Statistical analysis via bootstrapping revealed that pure Transformers exhibit significantly wider 9 5% confidence intervals (Δ 0.06) compared to CNNs(Δ 0.04), indicating higher instability. While Hybrids achieved the highest Accuracy (75.2%), their lower QWK (0.678) implies reduced consistency in grading severe disease stages. It is demonstrated that the locality inherent to CNNs is crucial for detecting subtle retinal pathologies. While Hybrids offer a middle ground, pure Transformers demonstrate lower reliability in this domain.
키워드
- 제목
- Inductive Bias Matters: Benchmarking CNNs, Transformers, and Hybrid Architectures for Diabetic Retinopathy Grading
- 저자
- Omer, Muhammad; Ali, Sardar Jaffar; Le, Duc-Tai; Choo, Hyunseung
- 발행일
- 2026
- 유형
- Conference Paper
- 저널명
- Proceedings of the 2026 20th International Conference on Ubiquitous Information Management and Communication, IMCOM 2026