SPARQ: An Accelerator Architecture for Large Language Models with Joint Sparsity and Quantization Techniques
Citations

WEB OF SCIENCE

0

초록

Large Language Models (LLMs) have demonstrated unprecedented capabilities in text generation, translation, and summarization tasks. However, their deployment on resourceconstrained systems remains challenging due to their large parameter sizes and high computational demands. To address this, we propose SPARQ, a specialized accelerator architecture that leverages both sparsity and quantization to optimize LLM inference. By integrating multiply-accumulate units tailored for quantized operations and a systolic array architecture supporting N:M semi-structured sparsity, SPARQ significantly enhances area and energy efficiency with minimal impact on model quality, as demonstrated in prior work on GPTQ and SparseGPT. Our evaluations demonstrate that SPARQ achieves up to 1.53 times greater area efficiency and 1.58 times better energy efficiency compared to the baseline, particularly for larger models.

키워드

AcceleratorLarge Language ModelSparsityQuantization
제목
SPARQ: An Accelerator Architecture for Large Language Models with Joint Sparsity and Quantization Techniques
저자
Choi, SeonggyuCho, Hyungmin
DOI
10.1145/3735452.3735523
발행일
2025-06
유형
Proceedings Paper
저널명
PROCEEDINGS OF THE 26TH ACM SIGPLAN/SIGBED INTERNATIONAL CONFERENCE ON LANGUAGES, COMPILERS, AND TOOLS FOR EMBEDDED SYSTEMS, LCTES 2025
페이지
3 ~ 15