TY - GEN
T1 - Bit-Separable Radix-4 Booth Multiplier for Power-Efficient CNN Accelerator
AU - Park, Seunghyun
AU - Park, Daejin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - As the demand for efficient computational hardware escalates, optimizing power-hungry multipliers becomes paramount. This is particularly crucial as high-performance AI applications shift towards low-power edge devices, necessitating reduced power consumption. This paper introduces a novel bit-separable radix-4 Booth multiplier tailored for low-power training and inference on edge device. Our proposed CNN accelerator with bit-separable multiplier maximizes hardware reusability through a structural division of the multiplicand and accelerates speed by first calculating the higher bits of the multiplicand and then decoding the dynamic range of results to omit processing of lower bits. To accommodate various AI models, experiments were conducted using expandable off-chip accelerators. We manufactured an off-chip accelerator chip using the commercial 130nm process. The experimental results showed that compared to the traditional radix-4 Booth multiplier, the chip size was reduced by 18.8%. There was a 68% decrease in total power consumption, a 47% increase in computational speed, and a 53% reduction in computational resources.
AB - As the demand for efficient computational hardware escalates, optimizing power-hungry multipliers becomes paramount. This is particularly crucial as high-performance AI applications shift towards low-power edge devices, necessitating reduced power consumption. This paper introduces a novel bit-separable radix-4 Booth multiplier tailored for low-power training and inference on edge device. Our proposed CNN accelerator with bit-separable multiplier maximizes hardware reusability through a structural division of the multiplicand and accelerates speed by first calculating the higher bits of the multiplicand and then decoding the dynamic range of results to omit processing of lower bits. To accommodate various AI models, experiments were conducted using expandable off-chip accelerators. We manufactured an off-chip accelerator chip using the commercial 130nm process. The experimental results showed that compared to the traditional radix-4 Booth multiplier, the chip size was reduced by 18.8%. There was a 68% decrease in total power consumption, a 47% increase in computational speed, and a 53% reduction in computational resources.
KW - bit-separable multiplier
KW - dynamic range decoder
KW - edge device
KW - low-power
KW - radix-4 Booth multiplier
UR - http://www.scopus.com/inward/record.url?scp=85194138684&partnerID=8YFLogxK
U2 - 10.1109/COOLCHIPS61292.2024.10531170
DO - 10.1109/COOLCHIPS61292.2024.10531170
M3 - Conference contribution
AN - SCOPUS:85194138684
T3 - IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2024 - Proceedings
BT - IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2024
Y2 - 17 April 2024 through 19 April 2024
ER -