DPU-Aware Hardware-Software Co-Design for Real-Time Semantic Segmentation
  • Oh, Seung Wook
  • Hwang, Gyu Hyeon
  • Oh, Hobin
  • Sim, Hyeonjin
  • Choi, Min Kwon
  • ... Jeon, Jae Wook
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

On specialized hardware like the Xilinx Deep Learning Processing Unit (DPU), the theoretical efficiency of a neural network model often does not translate into real-world performance. This work investigates a hardware-software co-design methodology for adapting lightweight models to the DPU architecture through an exploratory case study. We replaced the encoders of ERFNet and ESNet with the DPU-friendly MobileNetV2 and evaluated the performance on the Xilinx Kria KV260 platform. While this co-design strategy boosted ERFNet's DPU-only inference throughput by 2.14x (from 9.05 to 19.36 FPS), the end-to-end system throughput - including pre-and post-processing - remained stalled at roughly 0.35 FPS. This outcome provides empirical evidence of a "bottleneck shift,"where the performance constraint migrates from the DPU hardware to CPU-bound software routines. Our findings emphasize that achieving true real-time performance in embedded AI systems requires a holistic optimization of the entire pipeline, not just the neural network accelerator.

키워드

DPUFPGAHardware-Software Co-DesignMobileNetV2Semantic SegmentationSystem BottleneckZynq
제목
DPU-Aware Hardware-Software Co-Design for Real-Time Semantic Segmentation
저자
Oh, Seung WookHwang, Gyu HyeonOh, HobinSim, HyeonjinChoi, Min KwonJeon, Jae Wook
DOI
10.1109/ICCE-Asia67487.2025.11263642
발행일
2025
유형
Conference Paper
저널명
2025 IEEE/IEIE International Conference on Consumer Electronics-Asia, ICCE-Asia 2025