TY - JOUR
T1 - Data Augmentation using Checked Pattern Mask for Acoustic Scene Classification
AU - Baek, Seungjae
AU - Lee, Seokjin
N1 - Publisher Copyright:
© 2022 Proceedings of the International Congress on Acoustics. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Recently, the acoustic scene classification (ASC) algorithm, which classifies the environment audio signals into acoustic scenes, such as shopping malls and trains, mainly uses machine-learning to achieve its goal. Various techniques, such as subsystem ensembles, network dropout, and data augmentation, have been employed to overcome the overfitting problem and improve the performance of machine-learning systems. In particular, data augmentation is easy to use because it modifies the input data only without changing the predefined model's size or structure. However, existing data augmentation methods sometimes generate meaningless data that cannot improve ASC performance. Therefore, we tried to maximize the augmentation of meaningful data using the GridMask augmentation method, which was developed for image processing systems. Because the temporal axis and the frequency axis differ in the audio spectrogram while the axes are the same in the image signal, we propose to use checked pattern masking to change the data and to prevent loss of important data. The proposed masking strategy is promising, as it does not mask a large part of data at once. The performance of the proposed method was evaluated using ASC data from the Detection and Classification ofAcoustic Scenes and Events (DCASE) 2021 Task 1A dataset.
AB - Recently, the acoustic scene classification (ASC) algorithm, which classifies the environment audio signals into acoustic scenes, such as shopping malls and trains, mainly uses machine-learning to achieve its goal. Various techniques, such as subsystem ensembles, network dropout, and data augmentation, have been employed to overcome the overfitting problem and improve the performance of machine-learning systems. In particular, data augmentation is easy to use because it modifies the input data only without changing the predefined model's size or structure. However, existing data augmentation methods sometimes generate meaningless data that cannot improve ASC performance. Therefore, we tried to maximize the augmentation of meaningful data using the GridMask augmentation method, which was developed for image processing systems. Because the temporal axis and the frequency axis differ in the audio spectrogram while the axes are the same in the image signal, we propose to use checked pattern masking to change the data and to prevent loss of important data. The proposed masking strategy is promising, as it does not mask a large part of data at once. The performance of the proposed method was evaluated using ASC data from the Detection and Classification ofAcoustic Scenes and Events (DCASE) 2021 Task 1A dataset.
KW - acoustic scene classification
KW - data augmentation
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85192541004&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85192541004
SN - 2226-7808
JO - Proceedings of the International Congress on Acoustics
JF - Proceedings of the International Congress on Acoustics
T2 - 24th International Congress on Acoustics, ICA 2022
Y2 - 24 October 2022 through 28 October 2022
ER -