Weight initialization based-rectified linear unit activation function to improve the performance of a convolutional neural network model

Bekhzod Olimov, Sanjar Karshiev, Eungyeong Jang, Sadia Din, Anand Paul, Jeonghong Kim

Research output: Contribution to journalArticlepeer-review

35 Scopus citations

Abstract

Convolutional Neural Networks (CNNs) have made a great impact on attaining state-of-the-art results in image task classification. Weight initialization is one of the fundamental steps in formulating a CNN model. It determines the failure or success of the CNN model. In this paper, we conduct a research based on the mathematical background of different weight initialization strategies to determine the one with better performance. To have smooth training, we expect the activation of each layer of the CNN model follow the standard normal distribution with mean 0 and SD 1. It prevents gradients from vanishing and leads to more smooth training. However, it was obtained that even with the appropriate weight initialization technique, a regular Rectified Linear Unit (ReLU) activation function increases the activation mean value. In this paper, we address this issue by proposing weight initialization based (WIB)-ReLU activation function. The proposed method resulted in more smooth training. Moreover, the experiments showed that WIB-ReLU outperforms ReLU, Leaky ReLU, parametric ReLU, and exponential linear unit activation functions and results in up to 20% decrease in loss value and 5% increase in accuracy score on both Fashion-MNIST and CIFAR-10 databases.

Original languageEnglish
Article numbere6143
JournalConcurrency and Computation: Practice and Experience
Volume33
Issue number22
DOIs
StatePublished - 25 Nov 2021

Keywords

  • ReLU
  • activation function
  • convolutional neural networks
  • deep learning
  • weight initialization

Fingerprint

Dive into the research topics of 'Weight initialization based-rectified linear unit activation function to improve the performance of a convolutional neural network model'. Together they form a unique fingerprint.

Cite this