상세 보기
- Kim, Jaerok;
- Kim, Heeyeon;
- Lee, Yoonmyung
SCOPUS
0초록
Recent advances in Compute-In-Memory (CIM) technology have significantly improved energy efficiency [1-7]. As in Fig. 1, among various data formats, BF16 has been widely adopted due to its exponent-based representation, offering enhanced dynamic range and lower relative error compared to INT's absolute error, thereby reducing accuracy degradation under approximation. However, several critical challenges remain in BF16-CIM macro implementations: (1) synthesis of 8 -bit exponent processing circuit demands numerous full adders (FAs) for handling MAC pairs, adversely impacting energy efficiency and throughput [2]; (2) traditional bit-serial operations, although hardware-efficient, typically require multiple cycles (4/8 cycles for 8 -bit computations), increasing toggling activity and energy consumption [1,2,6,7]; and (3) existing compact FA structures (e.g., 10T/12T FAs) encounter threshold voltage (VTH) drops, and exact adder trees significantly increase power, latency, and hardware overhead [4, 7].
- 제목
- A 28nm 77.2 TFLOPS/W Digital Floating-Point Compute-InMemory Macro Employing Dynamic Find-Max and ReducedCycle Bit-Serial Architecture with Approximation
- 저자
- Kim, Jaerok; Kim, Heeyeon; Lee, Yoonmyung
- 발행일
- 2025
- 유형
- Conference Paper
- 저널명
- 2025 IEEE Asian Solid-State Circuits Conference, A-SSCC 2025 - Proceedings
- 페이지
- 142 ~ 144