Waveform-based End-to-end Deep Convolutional Neural Network with Multi-scale Sliding Windows for Weakly Labeled Sound Event Detection

Seokjin Lee, Minhan Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

In this paper, a waveform-based end-to-end sound event detection algorithm that detects and classifies sound events using a deep convolutional neural network architecture is proposed. While most machine-learning-based acoustic signal processing systems utilize hand-crafted feature vectors e.g. log-Mel spectrogram, end-to-end methods, which utilize raw input data, have recently been investigated for use in various applications. Therefore, we develop an end-to-end architecture for sound event detection tasks with convolutional neural networks. The proposed model consists of multi-scale time frames and networks that handle both short and long signal characteristics; the frame slides by 0.1 second to provide a sufficiently fine resolution. The element network for each time frame consists of several one-dimensional convolutional neural networks with a deeply stacked structure. The results of the element networks are averaged and gated by sound activity detection. In order to handle unlabeled data, the trained networks are enhanced using the mean-teacher model. A decision is made via double thresholding, and the results are enhanced using class-wise minimum gap/length compensation. To evaluate our proposed approach, simulations are performed with development data from DCASE 2019 Task 4, and the results show that the proposed algorithm had a macro-averaged F1 score of 31.7% for the DCASE 2019 development dataset, 30.2% for the DCASE 2018 evaluation dataset, and 26.7% for the DCASE 2019 evaluation dataset.

Original languageEnglish
Title of host publication2020 International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages182-186
Number of pages5
ISBN (Electronic)9781728149851
DOIs
StatePublished - Feb 2020
Event2nd International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2020 - Fukuoka, Japan
Duration: 19 Feb 202021 Feb 2020

Publication series

Name2020 International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2020

Conference

Conference2nd International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2020
Country/TerritoryJapan
CityFukuoka
Period19/02/2021/02/20

Keywords

  • convolutional neural network
  • end-to-end
  • sound event detection
  • waveform
  • weakly supervised

Fingerprint

Dive into the research topics of 'Waveform-based End-to-end Deep Convolutional Neural Network with Multi-scale Sliding Windows for Weakly Labeled Sound Event Detection'. Together they form a unique fingerprint.

Cite this