LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms

Research output: Contribution to journalArticlepeer-review

Abstract

Although recent deep-learning-based speech enhancement (SE) methods significantly outperform traditional approaches, their computational demands often scale proportionally with their performance. This scaling typically makes them impractical for deployment on data throughput-sensitive and resource-constrained edge devices. In this paper, we propose a novel lightweight spectral enhancement network (LSENet) designed to estimate high-quality speech with minimal computational overhead. The network consists of an encoder-decoder architecture enhanced by a group-dilated convolutional module, which efficiently leverages time-frequency domain information while significantly reducing resource consumption through dilated convolutional groups and spectral-wise attention modules. Additionally, to capture the long-range contextual dependencies of the extracted features, an improved dual-path recurrent neural network is introduced between the encoder and decoder structures. Experimental results show that the proposed model achieves competitive performance with state-of-the-art baseline models on the Voicebank + Demand and DNS-Challenge datasets while requiring only 39.4 thousand model parameters and 237 million multiply-accumulate operations.

Original languageEnglish
Pages (from-to)116934-116943
Number of pages10
JournalIEEE Access
Volume13
DOIs
StatePublished - 2025

Keywords

  • Deep learning
  • attention mechanisms
  • factorized convolution
  • lightweight network
  • speech enhancement

Fingerprint

Dive into the research topics of 'LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms'. Together they form a unique fingerprint.

Cite this