Abstract
Processing-in-memory (PIM) architectures show the advantage of handling applications that generate complicated memory request patterns; usually, those kinds of memory streams degrade the application's performance in conventional memory hierarchy systems. In particular, deep convolutional neural networks (DCNNs) processing that consists of several functionalities could be highly optimized if PIM cores can extend the processing capability and data accessibility. In this work, we propose a functionality-based PIM accelerator for DCNNs. We design several modules in addition to the conventional PIM system based on a hybrid memory cube (HMC). First, we compose a new buffer module, namely, a shared cache, in which PIM cores are provided DCNN functionalities and pre-trained weights. The PIM cores subsequently enhance computational utilization and data accessibility. Second, an efficient replacement method complements the shared cache to optimize the data miss rate of DCNN processing. Third, we compose dual prefetchers that can deal with DCNN's memory access patterns, thereby reducing the system's overall latency. Fourth, we compose a PIM scheduler for PIM core-level autonomous request control. The PIM scheduler relieves the host processor of significant computational loads, achieving the overall latency of the system and reducing the energy consumption. By the performance evaluation based on the trace-driven HMC simulator, our proposed model improves average latency and bandwidth by 38.9 and 27.9 % with only 18.7 % more energy consumption compared with conventional HMC-based PIM systems. Our system also achieves scalable processing performance because when the DCNN becomes deeper, it processes faster than conventional PIM systems.
Original language | English |
---|---|
Pages (from-to) | 145098-145108 |
Number of pages | 11 |
Journal | IEEE Access |
Volume | 9 |
DOIs | |
State | Published - 2021 |
Keywords
- 3D memory
- accelerator architectures
- artificial intelligence accelerator
- computer system
- deep neural network
- prefetch
- processing-in-memory