TY - JOUR
T1 - Differential Image-Based Scalable YOLOv7-Tiny Implementation for Clustered Embedded Systems
AU - Hong, Sunghoon
AU - Park, Daejin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Convolutional neural networks (CNNs) for powerful visual image analysis are gaining popularity in artificial intelligence. The main difference in CNNs compared to other artificial neural networks is that many convolutional layers are added, which improve the performance of visual image analysis by extracting the feature maps required for image classification. However, algorithm optimization is required to run applications that require low-latency in edge compute modules with limited processing resources. In this paper, we propose a novel algorithm optimization method for fast CNNs by using continuous differential images. The main idea is to reduce computation variably by using the differential value of the input in each convolutional layer. Also, the proposed method is compatible with all types of CNNs, and the performance is better when the pixel value difference of continuous images is low. We use the DarkNet framework to evaluate our algorithm using fast convolution and half convolution approaches on a clustered system. As a result, when the input frame rate is 10 fps, FLOPs are reduced by about 4.92 times compared to the original YOLOv7-tiny. By reducing the FLOPs of the convolutional layer, the inference speed increases to about 4.86 FPS, performing 1.57 times faster than the original YOLOv7-tiny. In the case of parallel processing that used two edge compute modules for using half convolution approach, FLOPs reduced more, and the response speed improved. In addition, faster Object detection implementation is possible by additionally expanding up to 7 compute modules in a scalable clustered embedded system as much as the user wants.
AB - Convolutional neural networks (CNNs) for powerful visual image analysis are gaining popularity in artificial intelligence. The main difference in CNNs compared to other artificial neural networks is that many convolutional layers are added, which improve the performance of visual image analysis by extracting the feature maps required for image classification. However, algorithm optimization is required to run applications that require low-latency in edge compute modules with limited processing resources. In this paper, we propose a novel algorithm optimization method for fast CNNs by using continuous differential images. The main idea is to reduce computation variably by using the differential value of the input in each convolutional layer. Also, the proposed method is compatible with all types of CNNs, and the performance is better when the pixel value difference of continuous images is low. We use the DarkNet framework to evaluate our algorithm using fast convolution and half convolution approaches on a clustered system. As a result, when the input frame rate is 10 fps, FLOPs are reduced by about 4.92 times compared to the original YOLOv7-tiny. By reducing the FLOPs of the convolutional layer, the inference speed increases to about 4.86 FPS, performing 1.57 times faster than the original YOLOv7-tiny. In the case of parallel processing that used two edge compute modules for using half convolution approach, FLOPs reduced more, and the response speed improved. In addition, faster Object detection implementation is possible by additionally expanding up to 7 compute modules in a scalable clustered embedded system as much as the user wants.
KW - Convolutional neural networks
KW - clustered systems
KW - deep learning
KW - fast convolution
UR - http://www.scopus.com/inward/record.url?scp=85207147809&partnerID=8YFLogxK
U2 - 10.1109/TITS.2024.3419095
DO - 10.1109/TITS.2024.3419095
M3 - Article
AN - SCOPUS:85207147809
SN - 1524-9050
VL - 25
SP - 16036
EP - 16047
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
IS - 11
ER -