TY - JOUR
T1 - A study on the waveform-based end-to-end deep convolutional neural network for weakly supervised sound event detection
AU - Lee, Seokjin
AU - Kim, Minhan
AU - Jeong, Youngho
N1 - Publisher Copyright:
© 2020 Journal of the Acoustical Society of Korea. All rights reserved.
PY - 2020/1
Y1 - 2020/1
N2 - In this paper, the deep convolutional neural network for sound event detection is studied. Especially, the end-to-end neural network, which generates the detection results from the input audio waveform, is studied for weakly supervised problem that includes weakly-labeled and unlabeled dataset. The proposed system is based on the network structure that consists of deeply-stacked 1-dimensional convolutional neural networks, and enhanced by the skip connection and gating mechanism. Additionally, the proposed system is enhanced by the sound event detection and post processings, and the training step using the mean-teacher model is added to deal with the weakly supervised data. The proposed system was evaluated by the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 4 dataset, and the result shows that the proposed system has F1-scores of 54 % (segment-based) and 32 % (event-based).
AB - In this paper, the deep convolutional neural network for sound event detection is studied. Especially, the end-to-end neural network, which generates the detection results from the input audio waveform, is studied for weakly supervised problem that includes weakly-labeled and unlabeled dataset. The proposed system is based on the network structure that consists of deeply-stacked 1-dimensional convolutional neural networks, and enhanced by the skip connection and gating mechanism. Additionally, the proposed system is enhanced by the sound event detection and post processings, and the training step using the mean-teacher model is added to deal with the weakly supervised data. The proposed system was evaluated by the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 4 dataset, and the result shows that the proposed system has F1-scores of 54 % (segment-based) and 32 % (event-based).
KW - Deep convolutional neural network
KW - End-to-end neural network
KW - Sound event detection
KW - Weakly supervised training
UR - http://www.scopus.com/inward/record.url?scp=85088146467&partnerID=8YFLogxK
U2 - 10.7776/ASK.2020.39.1.024
DO - 10.7776/ASK.2020.39.1.024
M3 - Article
AN - SCOPUS:85088146467
SN - 1225-4428
VL - 39
SP - 24
EP - 31
JO - Journal of the Acoustical Society of Korea
JF - Journal of the Acoustical Society of Korea
IS - 1
ER -