TSDCA-BA: An Ultra-Lightweight Speech Enhancement Model for Real-Time Hearing Aids with Multi-Scale STFT Fusion

Zujie Fan, Zikun Guo, Yanxing Lai, Jaesoo Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Lightweight speech denoising models have made remarkable progress in improving both speech quality and computational efficiency. However, most models rely on long temporal windows as input, limiting their applicability in low-latency, real-time scenarios on edge devices. To address this challenge, we propose a lightweight hybrid module, Temporal Statistics Enhancement, Squeeze-and-Excitation-based Dual Convolutional Attention, and Band-wise Attention (TSE, SDCA, BA) Module. The TSE module enhances single-frame spectral features by concatenating statistical descriptors—mean, standard deviation, maximum, and minimum—thereby capturing richer local information without relying on temporal context. The SDCA and BA module integrates a simplified residual structure and channel attention, while the BA component further strengthens the representation of critical frequency bands through band-wise partitioning and differentiated weighting. The proposed model requires only 0.22 million multiply–accumulate operations (MMACs) and contains a total of 112.3 K parameters, making it well suited for low-latency, real-time speech enhancement applications. Experimental results demonstrate that among lightweight models with fewer than 200K parameters, the proposed approach outperforms most existing methods in both denoising performance and computational efficiency, significantly reducing processing overhead. Furthermore, real-device deployment on an improved hearing aid confirms an inference latency as low as 2 milliseconds, validating its practical potential for real-time edge applications.

Original languageEnglish
Article number8183
JournalApplied Sciences (Switzerland)
Volume15
Issue number15
DOIs
StatePublished - Aug 2025

Keywords

  • audio denoising
  • band-wise attention
  • lightweight model
  • low complexity
  • multi-scale STFT
  • real-time speech enhancement

Fingerprint

Dive into the research topics of 'TSDCA-BA: An Ultra-Lightweight Speech Enhancement Model for Real-Time Hearing Aids with Multi-Scale STFT Fusion'. Together they form a unique fingerprint.

Cite this