상세 보기
- Park, Minseon;
- Shin, Jitae
SCOPUS
0초록
Optical flow estimation plays a critical role in various computer vision tasks, including video understanding and autonomous driving. Recent models such as RAFT and FlowFormer refine flow predictions iteratively using recurrent modules based on Convolutional Gated Recurrent Unit (Con-vGRU). However, ConvGRU has limitations in modeling long-range dependencies and requires a large number of parameters for decoder refinement. In this paper, we propose replacing the ConvGRU module in FlowFormer's decoder with Mamba, a state space sequence model optimized for efficient and expressive temporal modeling. Additionally, we introduce a multi-scale loss structure that incorporates low-resolution supervision to encourage global motion consistency and improve training stability. Our method maintains the original input structure of FlowFormer while improving both temporal modeling and multi-scale learning. Experiments on the KITTI benchmark show that our Mamba-based decoder achieves significant improvements over the original FlowFormer, reducing average end-point-error (AEPE) by 5.81% and F1-All by 13.41%, while also reducing decoder parameters by 32.65% and FLOPs by 22.88%. These results demonstrate that Mamba, combined with multi-scale loss, is a strong and lightweight alternative to ConvGRU for optical flow refinement.
키워드
- 제목
- Efficient Recurrent Optical Flow Refinement Using Mamba and Multi-Scale Loss
- 저자
- Park, Minseon; Shin, Jitae
- 발행일
- 2025
- 유형
- Conference Paper
- 저널명
- 2025 International Technical Conference on Circuits/Systems, Computers, and Communications, ITC-CSCC 2025