A 28nm 77.2 TFLOPS/W Digital Floating-Point Compute-InMemory Macro Employing Dynamic Find-Max and ReducedCycle Bit-Serial Architecture with Approximation
Citations

SCOPUS

0

초록

Recent advances in Compute-In-Memory (CIM) technology have significantly improved energy efficiency [1-7]. As in Fig. 1, among various data formats, BF16 has been widely adopted due to its exponent-based representation, offering enhanced dynamic range and lower relative error compared to INT's absolute error, thereby reducing accuracy degradation under approximation. However, several critical challenges remain in BF16-CIM macro implementations: (1) synthesis of 8 -bit exponent processing circuit demands numerous full adders (FAs) for handling MAC pairs, adversely impacting energy efficiency and throughput [2]; (2) traditional bit-serial operations, although hardware-efficient, typically require multiple cycles (4/8 cycles for 8 -bit computations), increasing toggling activity and energy consumption [1,2,6,7]; and (3) existing compact FA structures (e.g., 10T/12T FAs) encounter threshold voltage (VTH) drops, and exact adder trees significantly increase power, latency, and hardware overhead [4, 7].

제목
A 28nm 77.2 TFLOPS/W Digital Floating-Point Compute-InMemory Macro Employing Dynamic Find-Max and ReducedCycle Bit-Serial Architecture with Approximation
저자
Kim, JaerokKim, HeeyeonLee, Yoonmyung
DOI
10.1109/A-SSCC67472.2025.11349645
발행일
2025
유형
Conference Paper
저널명
2025 IEEE Asian Solid-State Circuits Conference, A-SSCC 2025 - Proceedings
페이지
142 ~ 144