상세 보기
- Rhe, Johnny;
- Park, Juhong;
- Jeon, Kang Eun;
- Ko, Jong Hwan
WEB OF SCIENCE
0SCOPUS
0초록
Transformer models have set new performance benchmarks in vision and language applications. However, their attention mechanisms remain poorly suited for Compute-In-Memory (CIM) architectures due to frequent memory write and compute-write dependencies. These limitations often lead to increased latency and inefficient resource use. In this work, we propose Efficient Transformer Attention mapping (ETA), a novel approach optimized for ReRAM-based CIM systems. ETA alleviates the compute-write bottleneck by enabling parallel execution of computation and memory writes, and reduces the number of required arrays through an array-aware mapping strategy. This dual optimization leads to significant improvements in both energy efficiency and latency. Experimental results on DeiT-small and GPT2-small using 64 × 64 array demonstrate that ETA outperforms previous state-of-the-art methods, reducing waiting-for-write (W4W) by up to 66%, latency by up to 20%, and fewer arrays by up to 29%.
키워드
- 제목
- ETA: Efficient Transformer Attention Mapping for ReRAM-Based Compute-In-Memory Architectures
- 저자
- Rhe, Johnny; Park, Juhong; Jeon, Kang Eun; Ko, Jong Hwan
- 발행일
- 2025
- 유형
- Conference Paper
- 저널명
- Proceedings - 2025 21st IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2025