Performance analysis of weakly-supervised sound event detection system based on the mean-teacher convolutional recurrent neural network model

Research output: Contribution to journalArticlepeer-review

Abstract

This paper introduces and implements a Sound Event Detection (SED) system based on weaklysupervised learning where only part of the data is labeled, and analyzes the effect of parameters. The SED system estimates the classes and onset/offset times of events in the acoustic signal. In order to train the model, all information on the event class and onset/offset times must be provided. Unfortunately, the onset/offset times are hard to be labeled exactly. Therefore, in the weakly-supervised task, the SED model is trained by "strongly labeled data" including the event class and activations, "weakly labeled data" including the event class, and "unlabeled data" without any label. Recently, the SED systems using the mean-teacher model are widely used for the task with several parameters. These parameters should be chosen carefully because they may affect the performance. In this paper, performance analysis was performed on parameters, such as the feature, moving average parameter, weight of the consistency cost function, ramp-up length, and maximum learning rate, using the data of DCASE 2020 Task 4. Effects and the optimal values of the parameters were discussed.

Original languageEnglish
Pages (from-to)139-147
Number of pages9
JournalJournal of the Acoustical Society of Korea
Volume40
Issue number2
DOIs
StatePublished - 2021

Keywords

  • Convolutional recurrent neural network
  • Mean-teacher
  • Semi-supervised learning
  • Sound event detection

Fingerprint

Dive into the research topics of 'Performance analysis of weakly-supervised sound event detection system based on the mean-teacher convolutional recurrent neural network model'. Together they form a unique fingerprint.

Cite this