BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

June Woo Kim, Miika Toikkanen, Yera Choi, Seoung Eun Moon, Ho Young Jung

Research output: Contribution to journalConference articlepeer-review

Abstract

Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata which includes the gender and age of patients, type of recording devices, and recording location on the patient's body. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%. This result validates the effectiveness of leveraging metadata and respiratory sound samples in enhancing RSC performance. Additionally, we investigate the model performance in the case where metadata is partially unavailable, which may occur in real-world clinical setting.

Original languageEnglish
Pages (from-to)1690-1694
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2024
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sep 20245 Sep 2024

Keywords

  • ICBHI
  • Metadata
  • Pretrained Language-Audio Model
  • Respiratory Sound Classification

Fingerprint

Dive into the research topics of 'BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification'. Together they form a unique fingerprint.

Cite this