On-device stereo matching using NPU acceleration for real-time depth estimation
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

We study NPU (neural processing unit) only stereo matching on a Qualcomm Snapdragon 8 Elite SoC (system-on-chip) for depth-based mobile applications. Running stereo matching entirely on the NPU decouples depth estimation from CPU/GPU workloads such as graphics or augmented reality (AR), improving overall system efficiency. Unlike many "mobile-friendly" stereo matching models that are evaluated on desktop GPUs or mixed CPU/GPU/NPU pipelines, our focus is on a lightweight stereo matching model under realistic NPU constraints. Starting from a floating-point baseline, we apply a post-training quantization and layout optimization scheme that preserves disparity accuracy while reducing NPU latency by about 33% at 960x540 spatial resolution. We also observe that the same network cannot be directly compiled at higher resolutions due to the limited on-chip memory of the NPU. To handle this constraint, we split high-resolution stereo inputs into two overlapping tiles, process each tile separately, and stitch the outputs back together. This two-tile scheme maintains disparity accuracy close to full-frame inference while still supporting high-resolution NPU-only stereo matching.

키워드

Stereo matchingneural processing unitquantizationmobile SoCtiled inferencedepth estimation
제목
On-device stereo matching using NPU acceleration for real-time depth estimation
저자
Park, MingiJeon, Byeungwoo
DOI
10.1117/12.3102515
발행일
2026
유형
Proceedings Paper
저널명
INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY, IWAIT 2026
14072