TY - JOUR
T1 - REF-Net
T2 - Robust, efficient, and fast network for semantic segmentation applications using devices with limited computational resources
AU - Olimov, Bekhzod
AU - Kim, Jeonghong
AU - Paul, Anand
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2021
Y1 - 2021
N2 - Considering importance of the autonomous driving applications for mobile devices, it is imperative to develop both fast and accurate semantic segmentation models. Thanks to emergence of Deep Learning (DL) techniques, the segmentation models enhanced their accuracy. However, this improved performance of currently popular DL models for self-driving car applications come at the cost of time and computational efficiency. Moreover, networks with efficient model architecture experience lack of accuracy. Therefore, in this study, we propose robust, efficient, and fast network (REF-Net) that combines carefully formulated encoding and decoding paths. Specifically, the contraction path uses mixture of dilated and asymmetric convolution layers with skip connections and bottleneck layers, while the decoding path benefits from nearest neighbor interpolation method that demands no trainable parameters to restore original image size. This model architecture considerably reduces the number of trainable parameters, required memory space, training, and inference time. In fact, the proposed model required nearly 90 times fewer trainable parameters and approximately 4 times less memory space that allowed 3-fold faster training runtime and 2-fold inference speedup in the conducted experiments using Cambridge-driving Labeled Video Database (CamVid) and Cityscapes datasets. Moreover, despite its notable efficiency in terms of memory and time, the REF-Net attained superior results in several segmentation evaluation metrics that showed roughly 2%, 4%, and 3% increase in pixel accuracy, Dice coefficient, and Jaccard Index, respectively.
AB - Considering importance of the autonomous driving applications for mobile devices, it is imperative to develop both fast and accurate semantic segmentation models. Thanks to emergence of Deep Learning (DL) techniques, the segmentation models enhanced their accuracy. However, this improved performance of currently popular DL models for self-driving car applications come at the cost of time and computational efficiency. Moreover, networks with efficient model architecture experience lack of accuracy. Therefore, in this study, we propose robust, efficient, and fast network (REF-Net) that combines carefully formulated encoding and decoding paths. Specifically, the contraction path uses mixture of dilated and asymmetric convolution layers with skip connections and bottleneck layers, while the decoding path benefits from nearest neighbor interpolation method that demands no trainable parameters to restore original image size. This model architecture considerably reduces the number of trainable parameters, required memory space, training, and inference time. In fact, the proposed model required nearly 90 times fewer trainable parameters and approximately 4 times less memory space that allowed 3-fold faster training runtime and 2-fold inference speedup in the conducted experiments using Cambridge-driving Labeled Video Database (CamVid) and Cityscapes datasets. Moreover, despite its notable efficiency in terms of memory and time, the REF-Net attained superior results in several segmentation evaluation metrics that showed roughly 2%, 4%, and 3% increase in pixel accuracy, Dice coefficient, and Jaccard Index, respectively.
KW - Autonomous driving
KW - Deep convolutional neural networks
KW - Nearest neighbor interpolation
KW - Semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85099724640&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2021.3052791
DO - 10.1109/ACCESS.2021.3052791
M3 - Article
AN - SCOPUS:85099724640
SN - 2169-3536
VL - 9
SP - 15084
EP - 15098
JO - IEEE Access
JF - IEEE Access
M1 - 9328441
ER -