상세 보기
- Oh, Seung Wook;
- Hwang, Gyu Hyeon;
- Oh, Hobin;
- Sim, Hyeonjin;
- Choi, Min Kwon;
- ... Jeon, Jae Wook
WEB OF SCIENCE
0SCOPUS
0초록
On specialized hardware like the Xilinx Deep Learning Processing Unit (DPU), the theoretical efficiency of a neural network model often does not translate into real-world performance. This work investigates a hardware-software co-design methodology for adapting lightweight models to the DPU architecture through an exploratory case study. We replaced the encoders of ERFNet and ESNet with the DPU-friendly MobileNetV2 and evaluated the performance on the Xilinx Kria KV260 platform. While this co-design strategy boosted ERFNet's DPU-only inference throughput by 2.14x (from 9.05 to 19.36 FPS), the end-to-end system throughput - including pre-and post-processing - remained stalled at roughly 0.35 FPS. This outcome provides empirical evidence of a "bottleneck shift,"where the performance constraint migrates from the DPU hardware to CPU-bound software routines. Our findings emphasize that achieving true real-time performance in embedded AI systems requires a holistic optimization of the entire pipeline, not just the neural network accelerator.
키워드
- 제목
- DPU-Aware Hardware-Software Co-Design for Real-Time Semantic Segmentation
- 저자
- Oh, Seung Wook; Hwang, Gyu Hyeon; Oh, Hobin; Sim, Hyeonjin; Choi, Min Kwon; Jeon, Jae Wook
- 발행일
- 2025
- 유형
- Conference Paper
- 저널명
- 2025 IEEE/IEIE International Conference on Consumer Electronics-Asia, ICCE-Asia 2025