ETA: Efficient Transformer Attention Mapping for ReRAM-Based Compute-In-Memory Architectures
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Transformer models have set new performance benchmarks in vision and language applications. However, their attention mechanisms remain poorly suited for Compute-In-Memory (CIM) architectures due to frequent memory write and compute-write dependencies. These limitations often lead to increased latency and inefficient resource use. In this work, we propose Efficient Transformer Attention mapping (ETA), a novel approach optimized for ReRAM-based CIM systems. ETA alleviates the compute-write bottleneck by enabling parallel execution of computation and memory writes, and reduces the number of required arrays through an array-aware mapping strategy. This dual optimization leads to significant improvements in both energy efficiency and latency. Experimental results on DeiT-small and GPT2-small using 64 × 64 array demonstrate that ETA outperforms previous state-of-the-art methods, reducing waiting-for-write (W4W) by up to 66%, latency by up to 20%, and fewer arrays by up to 29%.

키워드

Compute-in-memoryweight mapping
제목
ETA: Efficient Transformer Attention Mapping for ReRAM-Based Compute-In-Memory Architectures
저자
Rhe, JohnnyPark, JuhongJeon, Kang EunKo, Jong Hwan
DOI
10.1109/APCCAS67402.2025.11377506
발행일
2025
유형
Conference Paper
저널명
Proceedings - 2025 21st IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2025