TY - GEN
T1 - Dynamic MAC Unit Pruning Techniques in Runtime RTL Simulation for Area-Accuracy Efficient Implementation of Neural Network Accelerator
AU - Kwon, Jisu
AU - Yun, Heuijee
AU - Park, Daejin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Designing lightweight hardware accelerators that maintain model inference accuracy is a challenging task in edge artificial intelligence (AI) because of the heterogeneous design requirements between the software model and hardware accelerator. While model pruning is an effective approach for reducing the number of parameters and computational requirements, conventional software model pruning leads to inefficient overhead when deploying the model in a general-purpose accelerator for inference. To overcome these challenges, we propose an empirical register transfer level (RTL) simulation-based accelerator pruning technique that is optimized for domain applications and models. Specifically, the proposed technique measures the dynamic switching of processing element (PE) unit-wise signals during dataset sample inference on a 2D array of PEs and replaces it with dummy PEs based on the RTL simulation results. The proposed approach reduces the accelerator's area and power resource consumption while maintaining baseline accuracy. We automated the RTL generation and simulation reconfiguration process to enable PE pruning that requires iterative and empirical RTL simulation based on signal switching count results. Our experimental results show that while the model's inference accuracy decreased by only 1% (98% to 97%), we were able to reduce dynamic signal switching and synthesized area by up to 9.78% and 4.25%, respectively. Our approach suggests a lightweight hardware accelerator design that is dedicated to the target application and can be scaled without modifying the model.
AB - Designing lightweight hardware accelerators that maintain model inference accuracy is a challenging task in edge artificial intelligence (AI) because of the heterogeneous design requirements between the software model and hardware accelerator. While model pruning is an effective approach for reducing the number of parameters and computational requirements, conventional software model pruning leads to inefficient overhead when deploying the model in a general-purpose accelerator for inference. To overcome these challenges, we propose an empirical register transfer level (RTL) simulation-based accelerator pruning technique that is optimized for domain applications and models. Specifically, the proposed technique measures the dynamic switching of processing element (PE) unit-wise signals during dataset sample inference on a 2D array of PEs and replaces it with dummy PEs based on the RTL simulation results. The proposed approach reduces the accelerator's area and power resource consumption while maintaining baseline accuracy. We automated the RTL generation and simulation reconfiguration process to enable PE pruning that requires iterative and empirical RTL simulation based on signal switching count results. Our experimental results show that while the model's inference accuracy decreased by only 1% (98% to 97%), we were able to reduce dynamic signal switching and synthesized area by up to 9.78% and 4.25%, respectively. Our approach suggests a lightweight hardware accelerator design that is dedicated to the target application and can be scaled without modifying the model.
UR - http://www.scopus.com/inward/record.url?scp=85185381114&partnerID=8YFLogxK
U2 - 10.1109/MWSCAS57524.2023.10406146
DO - 10.1109/MWSCAS57524.2023.10406146
M3 - Conference contribution
AN - SCOPUS:85185381114
T3 - Midwest Symposium on Circuits and Systems
SP - 207
EP - 211
BT - 2023 IEEE 66th International Midwest Symposium on Circuits and Systems, MWSCAS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE 66th International Midwest Symposium on Circuits and Systems, MWSCAS 2023
Y2 - 6 August 2023 through 9 August 2023
ER -