MOST: Memory Oversubscription-aware Scheduling for Tensor Migration on GPU Unified Storage
  • Kim, Junsu
  • Jeon, Jaebeom
  • Park, Jaeyong
  • Choi, Sangun
  • Gil, Minseong
  • ... Hong, Seokin
  • 외 3명
Citations

WEB OF SCIENCE

2
Citations

SCOPUS

2

초록

Deep Neural Network (DNN) training demands large memory capacities that exceed the limits of current GPU onboard memory. Expanding GPU memory with SSDs is a cost-effective approach. However, the low bandwidth of SSDs introduces severe performance bottlenecks in data management, particularly for Unified Virtual Memory (UVM)-based systems. The default on-demand migration mechanism in UVM causes frequent page faults and stalls, exacerbated by memory oversubscription and eviction processes along the critical path. To address these challenges, this paper proposes Memory Oversubscription-aware Scheduling for Tensor Migration (MOST), a software framework designed to improve data migration in UVM environments. MOST profiles memory access behavior and quantifies the impact of memory oversubscription stalls and schedules tensor migrations to minimize overall training time. With the profiling results, MOST executes newly designed pre-eviction and prefetching instructions within DNN kernel code. MOST effectively selects and migrates tensors that can mitigate memory oversubscription stalls, thus reducing training time. Our evaluation shows that MOST achieves an average speedup of 22.9% and 12.8% over state-of-the-art techniques, DeepUM and G10, respectively. © 2002-2011 IEEE.

키워드

Computer architecturememory managementComputer architecturememory management
제목
MOST: Memory Oversubscription-aware Scheduling for Tensor Migration on GPU Unified Storage
저자
Kim, JunsuJeon, JaebeomPark, JaeyongChoi, SangunGil, MinseongHong, SeokinKoo, GunjaeYoon, Myung KukOh, Yunho
DOI
10.1109/LCA.2025.3580264
발행일
2025-07
유형
Article
저널명
IEEE Computer Architecture Letters
24
2
페이지
213 ~ 216