Skip to main navigation Skip to search Skip to main content

Data Allocation Rearrangement on CNN Accelerator Based on Reshaping Systolic Tile Array Using Planarized Matrix Reordering Techniques

  • Kyungpook National University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, demands and usages of artificial intelligence are growing more and more. The various of electrical computation devices, such as mobile devices, are being built with AI technology. But the AI operation's load is very heavy. The AI data processing method, usually a convolutional neural network, mainly consists of the matrix convolution and the matrix multiplication-actually, they are the same effective, depending on how the operation sequences are arranged. The matrix multiplication is not suited to sequential data processing. During matrix convolution or matrix multiplication, the memory access pattern is non-linear. This is difficult to use the parallelized data processing and, actually, the original sequential processor has limit hardware unit resources for parallelized data processing. Even in sequential processing, the non-linear access pattern needs a conditional branch test and jump instruction for address calculation. Eventually, these situations induce large power consumption and poor performance during AI operation. Therefore, it is necessary that the appropriate hardware-software system, that system is the matrix structure planarize to linear access and uses the parallel data processing based on MAC tile, is implemented for efficient AI algorithm computation. In this paper, a parallel data processing structure based on systolic tile array is used for fast operation time and efficient ALU resources usage in matrix multiplication. For enhanced data processing throughput, the proposed new accelerator is equipped with a heavier processing element-(PE) tile than the original systolic tile PE, making it possible to calculate the multi state partial sums at once. Reordering multidimensional matrix as a linear planarization matrix array is also proposed at the micro-architecture level for diminishing the nonlinear access address calculation. The accelerator's memory system enables multiple elements to commit and reduces the data access times to DRAM memory during operation. In conclusion, this proposal accelerator ends up performing AI CNN operations much faster, with low power consumption.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE 18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages60-63
Number of pages4
ISBN (Electronic)9798331565718
DOIs
StatePublished - 2025
Event18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC 2025 - Singapore, Singapore
Duration: 15 Dec 202518 Dec 2025

Publication series

NameProceedings - 2025 IEEE 18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC 2025

Conference

Conference18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC 2025
Country/TerritorySingapore
CitySingapore
Period15/12/2518/12/25

Keywords

  • Artificial intelligane accelerator
  • Convolutional neural network
  • Matrix or Tensor data processing
  • multicore processor
  • parallelized processing
  • Systolic Array
  • Verilog

Fingerprint

Dive into the research topics of 'Data Allocation Rearrangement on CNN Accelerator Based on Reshaping Systolic Tile Array Using Planarized Matrix Reordering Techniques'. Together they form a unique fingerprint.

Cite this