상세 보기
- Kim, Donghyuk;
- Kim, Jae-Young;
- Cho, Hyunjun;
- Yoo, Seungjae;
- Lee, Sukjin;
- ... Cho, Kunhee;
- 외 14명
WEB OF SCIENCE
0SCOPUS
0초록
Transformer models have revolutionized artificial intelligence (AI) applications across various domains, but their increasing complexity poses significant challenges in terms of computational and memory demands. While processing-in-memory (PIM) paradigms have been adopted to address these limitations, existing PIM-based transformer accelerators still face hurdles such as: 1) focusing solely on optimizing attention layers; 2) lack of sparsity exploitation for transformers; and 3) limited PIM macro capacity and low cell density, which degrades on-chip data reuse and increases external memory access (EMA). This article presents DPIM, a novel 2T1C eDRAM-based transformer-in-memory chip that addresses these challenges through three key innovations: 1) a sparsity-aware quantization (SAQ) scheme that significantly increases bit-slice sparsity in both activation and weight data, achieving ratios of 83.3% and 88.4%, respectively, with minimal accuracy loss; 2) a heterogeneous PIM core capable of efficiently handling both sparse and dense matrix multiplications (MMs); and 3) a high-density 2T1C eDRAM cell with a density of 1.38 Mb/mm(2), enabling large-capacity PIM macros. By integrating these features, DPIM achieves improved computational efficiency and reduced EMA with enhanced on-chip data reuse. The DPIM chip, fabricated using 28-nm CMOS technology, achieves a throughput of 3.03-12.12 TOPS and an energy efficiency of 4.84-19.36 TOPS/W, all measured across INT8 and INT4 operations, respectively. It achieves a throughput density of 0.55 TOPS/mm(2) with INT8 operation. With a total macro size of 4608 kb, the chip occupies a die area of 20.25 mm(2) and operates at frequencies from 50 to 285 MHz with a supply voltage of 0.85-1.0 V. The DPIM successfully executes BERT-Large on the general language understanding evaluation (GLUE) dataset. Its macro density is 1413 kb/mm(2), and the resulting density figure-of-merit (FoM) (macro density x throughput density) is 1.6x - 115.8x higher than previous works, representing a significant advancement in hardware design for efficient transformer processing.
키워드
- 제목
- DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense-Sparse Core
- 저자
- Kim, Donghyuk; Kim, Jae-Young; Cho, Hyunjun; Yoo, Seungjae; Lee, Sukjin; Yune, Sungwoong; Yang, Sejeong; Jeong, Hoichang; Park, Keonhee; Lee, Ki-Soo; Lee, Jongchan; Han, Chanheum; Koo, Gunmo; Han, Yuli; Kim, Jaejin; Kim, Jaemin; Lee, Kyuho Jason; Chae, Joo-Hyung; Cho, Kunhee; Kim, Joo-Young
- 발행일
- 2025-10
- 유형
- Article; Early Access
- 권
- 61
- 호
- 5
- 페이지
- 2349 ~ 2364