TY - JOUR
T1 - DCBT-Net
T2 - Training Deep Convolutional Neural Networks with Extremely Noisy Labels
AU - Olimov, Bekhzod
AU - Kim, Jeonghong
AU - Paul, Anand
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2020
Y1 - 2020
N2 - Obtaining data with correct labels is crucial to attain the state-of-the-art performance of Convolutional Neural Network (CNN) models. However, labeling datasets is significantly time-consuming and expensive process because it requires expert knowledge in a particular domain. Therefore, real-life datasets often exhibit incorrect labels due to the involvement of nonexperts in the data-labeling process. Consequently, there are many cases of incorrectly labeled data in the wild. Although the issue of poorly labeled datasets has been studied, the existing methods are complex and difficult to reproduce. Thus, in this study, we proposed a simpler algorithm called 'Deep Clean Before Training Net' (DCBT-Net) that is based on cleaning wrongly labeled data points using the information from eigenvalues of the Laplacian matrix obtained from similarities between the data samples. The cleaned data were trained using deep CNN (DCNN) to attain the state-of-the-art results. This system achieved better performance than the existing approaches. In conducted experiments, the performance of the DCBT-Net was tested on three commercially available datasets, namely, Modified National Institute of Standards and Technology (MNIST) database of handwritten digits, Canadian Institute for Advanced Research (CIFAR) and WebVision1000 datasets. The proposed method achieved better results when assessed using several evaluation metrics compared with the existing state-of-the-art methods. Specifically, the DCBT-Net attained an average 15%, 20%, and 3% increase in accuracy score using MNIST database, CIFAR-10 dataset, and WebVision dataset, respectively. Also, the proposed approach demonstrated better results in specificity, sensitivity, positive predictive value, and negative predictive value evaluation metrics.
AB - Obtaining data with correct labels is crucial to attain the state-of-the-art performance of Convolutional Neural Network (CNN) models. However, labeling datasets is significantly time-consuming and expensive process because it requires expert knowledge in a particular domain. Therefore, real-life datasets often exhibit incorrect labels due to the involvement of nonexperts in the data-labeling process. Consequently, there are many cases of incorrectly labeled data in the wild. Although the issue of poorly labeled datasets has been studied, the existing methods are complex and difficult to reproduce. Thus, in this study, we proposed a simpler algorithm called 'Deep Clean Before Training Net' (DCBT-Net) that is based on cleaning wrongly labeled data points using the information from eigenvalues of the Laplacian matrix obtained from similarities between the data samples. The cleaned data were trained using deep CNN (DCNN) to attain the state-of-the-art results. This system achieved better performance than the existing approaches. In conducted experiments, the performance of the DCBT-Net was tested on three commercially available datasets, namely, Modified National Institute of Standards and Technology (MNIST) database of handwritten digits, Canadian Institute for Advanced Research (CIFAR) and WebVision1000 datasets. The proposed method achieved better results when assessed using several evaluation metrics compared with the existing state-of-the-art methods. Specifically, the DCBT-Net attained an average 15%, 20%, and 3% increase in accuracy score using MNIST database, CIFAR-10 dataset, and WebVision dataset, respectively. Also, the proposed approach demonstrated better results in specificity, sensitivity, positive predictive value, and negative predictive value evaluation metrics.
KW - Clustering
KW - deep convolutional neural networks
KW - eigenvalues and eigenvectors
KW - image classification
KW - noisy (corrupted) labels
UR - http://www.scopus.com/inward/record.url?scp=85097380853&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3041873
DO - 10.1109/ACCESS.2020.3041873
M3 - Article
AN - SCOPUS:85097380853
SN - 2169-3536
VL - 8
SP - 220482
EP - 220495
JO - IEEE Access
JF - IEEE Access
M1 - 9276394
ER -