TY - JOUR
T1 - VerSA
T2 - Versatile Systolic Array Architecture for Sparse and Dense Matrix Multiplications
AU - Seo, Juwon
AU - Kong, Joonho
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/4
Y1 - 2024/4
N2 - A key part of modern deep neural network (DNN) applications is matrix multiplication. As DNN applications are becoming more diverse, there is a need for both dense and sparse matrix multiplications to be accelerated by hardware. However, most hardware accelerators are designed to accelerate either dense or sparse matrix multiplication. In this paper, we propose VerSA, a versatile systolic array architecture for both dense and sparse matrix multiplications. VerSA employs intermediate paths and SRAM buffers between the rows of the systolic array (SA), thereby enabling an early termination in sparse matrix multiplication with a negligible performance overhead when running dense matrix multiplication. When running sparse matrix multiplication, 256 × 256 VerSA brings performance (i.e., an inverse of execution time) improvement and energy saving by 1.21×–1.60× and 7.5–30.2%, respectively, when compared to the conventional SA. When running dense matrix multiplication, VerSA results in only a 0.52% performance overhead compared to the conventional SA.
AB - A key part of modern deep neural network (DNN) applications is matrix multiplication. As DNN applications are becoming more diverse, there is a need for both dense and sparse matrix multiplications to be accelerated by hardware. However, most hardware accelerators are designed to accelerate either dense or sparse matrix multiplication. In this paper, we propose VerSA, a versatile systolic array architecture for both dense and sparse matrix multiplications. VerSA employs intermediate paths and SRAM buffers between the rows of the systolic array (SA), thereby enabling an early termination in sparse matrix multiplication with a negligible performance overhead when running dense matrix multiplication. When running sparse matrix multiplication, 256 × 256 VerSA brings performance (i.e., an inverse of execution time) improvement and energy saving by 1.21×–1.60× and 7.5–30.2%, respectively, when compared to the conventional SA. When running dense matrix multiplication, VerSA results in only a 0.52% performance overhead compared to the conventional SA.
KW - dense matrix
KW - hardware acceleration
KW - matrix multiplication
KW - sparse matrix
KW - systolic array
UR - http://www.scopus.com/inward/record.url?scp=85191403273&partnerID=8YFLogxK
U2 - 10.3390/electronics13081500
DO - 10.3390/electronics13081500
M3 - Article
AN - SCOPUS:85191403273
SN - 2079-9292
VL - 13
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 8
M1 - 1500
ER -