Skip to main navigation Skip to search Skip to main content

Dual-MAC: Hardware Acceleration of DNN Inferences With a Customized Lightweight Brain Floating-Point Format

Research output: Contribution to journalArticlepeer-review

Abstract

Due to data explosion of the deep neural networks (DNNs), recent advancements for DNNs have been focusing on reducing an amount of data while preserving the accuracy. In spite of the advent of low precision data formats such as brain floating point 16-bit (bfloat16), the amount of data processed in DNN models is still increasing due to an increasingly growing parameter size. To this end, data reduction with efficient processing is crucial for the real-world deployment of the DNN models. In this paper, we propose a lightweight customized brain floating-point (bfloat) format, which is derived from bfloat16 data format. Based on our observation from the pre-trained weights, we reduce the size of each bfloat16 weight element up to 9-bit by using the fixed exponents and truncating the mantissa bits. For efficient processing of our lightweight customized bfloat format, we also propose dual-MAC hardware architecture, which enables up to 2× better throughput when processing our proposed format as compared to the case when processing the conventional bfloat16 format. We devise a novel processing element architecture for systolic arrays by utilizing the lookup table (LUT) for multiplication operations between the fixed exponent bits and a part of mantissa bits. Our PE architecture also enables a versatile processing for both the conventional bfloat16 and our customized bfloat formats. According to our evaluation results across the six widely used DNN models, our customized bfloat format results in 37.95% data reduction, on average, as compared to the conventional bfloat16 format with only small accuracy losses. In terms of performance, our dual-MAC hardware with the customized bfloat format leads to better performance by 56.2% as compared to using the conventional systolic array with the conventional bfloat16 format. Our dual-MAC hardware also leads to better performance-energy trade-offs due to better performance even with increased power consumption of our dual-MAC hardware.

Original languageEnglish
Pages (from-to)165738-165750
Number of pages13
JournalIEEE Access
Volume13
DOIs
StatePublished - 2025

Keywords

  • brain floating point
  • custom weight format
  • Deep neural network inference
  • hardware acceleration
  • throughput

Fingerprint

Dive into the research topics of 'Dual-MAC: Hardware Acceleration of DNN Inferences With a Customized Lightweight Brain Floating-Point Format'. Together they form a unique fingerprint.

Cite this