상세 보기
- Choi, Seonggyu;
- Cho, Hyungmin
WEB OF SCIENCE
0초록
Large Language Models (LLMs) have demonstrated unprecedented capabilities in text generation, translation, and summarization tasks. However, their deployment on resourceconstrained systems remains challenging due to their large parameter sizes and high computational demands. To address this, we propose SPARQ, a specialized accelerator architecture that leverages both sparsity and quantization to optimize LLM inference. By integrating multiply-accumulate units tailored for quantized operations and a systolic array architecture supporting N:M semi-structured sparsity, SPARQ significantly enhances area and energy efficiency with minimal impact on model quality, as demonstrated in prior work on GPTQ and SparseGPT. Our evaluations demonstrate that SPARQ achieves up to 1.53 times greater area efficiency and 1.58 times better energy efficiency compared to the baseline, particularly for larger models.
키워드
- 제목
- SPARQ: An Accelerator Architecture for Large Language Models with Joint Sparsity and Quantization Techniques
- 저자
- Choi, Seonggyu; Cho, Hyungmin
- 발행일
- 2025-06
- 유형
- Proceedings Paper
- 저널명
- PROCEEDINGS OF THE 26TH ACM SIGPLAN/SIGBED INTERNATIONAL CONFERENCE ON LANGUAGES, COMPILERS, AND TOOLS FOR EMBEDDED SYSTEMS, LCTES 2025
- 페이지
- 3 ~ 15