Multi-Layer Depth Weighted Fusion Approach for Speech Emotion Recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Fusion techniques have been proposed as a solution to data scarcity in speech emotion recognition (SER). The conventional fusion techniques are broadly classified into early, intermediate, and late fusion. Though in some cases they exhibit commendable results, they are sub optimal. They limit the model to learn distinctive and salient features, which are particularly crucial for enhancing performance in data-scarce scenarios. This is especially due to data sparsity and loss of emotional information as a result of fusion. In this paper, we introduce a multi-layer depth weighted fusion approach for SER. This approach fuses the feature representations from two branches of features across shallow, intermediate, and high-level stages of a deep network. This approach enhances SER performance by utilizing attentive convolution neural network (CNN) encoders, transformer encoders and bidirectional long-short term memory (LSTM) to capture the contextualized spatial and temporal feature relationships. The model achieves accuracy scores of 87.17%, 93.73%, and 96.58% on the KESDy18, RAVDESS, and EMODB datasets, respectively, and F1 scores of 87.84%, 93.73%, and 96.39% on the same datasets. Additionally, the model was evaluated on the SAVEE and CREMA datasets. These performance results highlight the effectiveness and robustness of our fusion approach across multiple emotional speech corpora.

Original languageEnglish
Title of host publicationICUFN 2025 - 16th International Conference on Ubiquitous and Future Networks
PublisherIEEE Computer Society
Pages55-58
Number of pages4
ISBN (Electronic)9798331524876
DOIs
StatePublished - 2025
Event16th International Conference on Ubiquitous and Future Networks, ICUFN 2025 - Hybrid, Lisbon, Portugal
Duration: 8 Jul 202511 Jul 2025

Publication series

NameInternational Conference on Ubiquitous and Future Networks, ICUFN
ISSN (Print)2165-8528
ISSN (Electronic)2165-8536

Conference

Conference16th International Conference on Ubiquitous and Future Networks, ICUFN 2025
Country/TerritoryPortugal
CityHybrid, Lisbon
Period8/07/2511/07/25

Keywords

  • emotion recognition
  • multi-layer
  • weight fusion

Fingerprint

Dive into the research topics of 'Multi-Layer Depth Weighted Fusion Approach for Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this