TY - GEN
T1 - RepAugment
T2 - 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2024
AU - Kim, June Woo
AU - Toikkanen, Miika
AU - Bae, Sangmin
AU - Kim, Minseok
AU - Jung, Ho Young
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment, but is also suitable for respiratory sound classification with waveform pretrained models. Experimental results show that our approach outperforms the SpecAugment, demonstrating a substantial improvement in the accuracy of minority disease classes, reaching up to 7.14%.
AB - Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment, but is also suitable for respiratory sound classification with waveform pretrained models. Experimental results show that our approach outperforms the SpecAugment, demonstrating a substantial improvement in the accuracy of minority disease classes, reaching up to 7.14%.
UR - https://www.scopus.com/pages/publications/85215005602
U2 - 10.1109/EMBC53108.2024.10782363
DO - 10.1109/EMBC53108.2024.10782363
M3 - Conference contribution
C2 - 40031507
AN - SCOPUS:85215005602
T3 - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
BT - 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 15 July 2024 through 19 July 2024
ER -