VerSA: Versatile Systolic Array Architecture for Sparse and Dense Matrix Multiplications

Juwon Seo, Joonho Kong

Research output: Contribution to journalArticlepeer-review

Abstract

A key part of modern deep neural network (DNN) applications is matrix multiplication. As DNN applications are becoming more diverse, there is a need for both dense and sparse matrix multiplications to be accelerated by hardware. However, most hardware accelerators are designed to accelerate either dense or sparse matrix multiplication. In this paper, we propose VerSA, a versatile systolic array architecture for both dense and sparse matrix multiplications. VerSA employs intermediate paths and SRAM buffers between the rows of the systolic array (SA), thereby enabling an early termination in sparse matrix multiplication with a negligible performance overhead when running dense matrix multiplication. When running sparse matrix multiplication, 256 × 256 VerSA brings performance (i.e., an inverse of execution time) improvement and energy saving by 1.21×–1.60× and 7.5–30.2%, respectively, when compared to the conventional SA. When running dense matrix multiplication, VerSA results in only a 0.52% performance overhead compared to the conventional SA.

Original languageEnglish
Article number1500
JournalElectronics (Switzerland)
Volume13
Issue number8
DOIs
StatePublished - Apr 2024

Keywords

  • dense matrix
  • hardware acceleration
  • matrix multiplication
  • sparse matrix
  • systolic array

Fingerprint

Dive into the research topics of 'VerSA: Versatile Systolic Array Architecture for Sparse and Dense Matrix Multiplications'. Together they form a unique fingerprint.

Cite this