TY - JOUR
T1 - Efficient Partial Weight Update Techniques for Lightweight On-Device Learning on Tiny Flash-Embedded MCUs
AU - Kwon, Jisu
AU - Park, Daejin
N1 - Publisher Copyright:
© 2009-2012 IEEE.
PY - 2023/12/1
Y1 - 2023/12/1
N2 - Typical training procedures involve read and write operations for weight updates during backpropagation. However, on-device training on microcontroller units (MCUs) presents two challenges. First, the on-chip SRAM has insufficient capacity to store the weight. Second, the large flash memory, which has a constraint on write access, becomes necessary to accommodate the network for on-device training on MCUs. To tackle these memory constraints, we propose a partial weight update technique based on gradient delta computation. The weights are stored in flash memory, and a part of the weight to be updated is selectively copied to the SRAM from the flash memory. We implemented this approach for training a fully connected network on an on-device MNIST digit classification task using only 20-kB SRAM and 1912-kB flash memory on an MCU. The proposed technique achieves reasonable accuracy with only 18.52% partial weight updates, which is comparable to state-of-the-art results. Furthermore, we achieved a reduction of up to 46.9% in the area-power-delay product compared to a commercially available high-performance MCU capable of embedding the entire model parameter, taking into account the area scale factor.
AB - Typical training procedures involve read and write operations for weight updates during backpropagation. However, on-device training on microcontroller units (MCUs) presents two challenges. First, the on-chip SRAM has insufficient capacity to store the weight. Second, the large flash memory, which has a constraint on write access, becomes necessary to accommodate the network for on-device training on MCUs. To tackle these memory constraints, we propose a partial weight update technique based on gradient delta computation. The weights are stored in flash memory, and a part of the weight to be updated is selectively copied to the SRAM from the flash memory. We implemented this approach for training a fully connected network on an on-device MNIST digit classification task using only 20-kB SRAM and 1912-kB flash memory on an MCU. The proposed technique achieves reasonable accuracy with only 18.52% partial weight updates, which is comparable to state-of-the-art results. Furthermore, we achieved a reduction of up to 46.9% in the area-power-delay product compared to a commercially available high-performance MCU capable of embedding the entire model parameter, taking into account the area scale factor.
KW - Artificial neural networks
KW - embedded software
KW - flash memories
KW - microcontrollers
UR - http://www.scopus.com/inward/record.url?scp=85173302220&partnerID=8YFLogxK
U2 - 10.1109/LES.2023.3298731
DO - 10.1109/LES.2023.3298731
M3 - Article
AN - SCOPUS:85173302220
SN - 1943-0663
VL - 15
SP - 206
EP - 209
JO - IEEE Embedded Systems Letters
JF - IEEE Embedded Systems Letters
IS - 4
ER -