Speech Emotion Recognition using Context-Aware Dilated Convolution Network

Samuel Kakuba, Dong Seog Han

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Deep learning-based speech emotion recognition has been applied for social living assistance, health monitoring, authentication, and other human-to-machine interaction applications. Because of the ubiquitous nature of the applications, computationally efficient and robust speech emotion recognition models are required. The nature of the speech signal requires tracking of time steps, analyzing long-term dependencies and the contexts of the utterances as well as the spatial cues. Recurrent neural networks like long short-term memory and gated recurrent units coupled with attention mechanisms are often used to consider long-term dependencies and context in the speech signal. However, they do not take care of the spatial cues that may exist in the speech signal. Moreover, the operation of most of these systems is sequential which causes slow convergence, and sluggish training. Therefore, we propose a model that employs dilated convolutions layers in combination with hybrid attention mechanisms. The model uses multi-head attention to extract the global context in the feature representations which are fed into the bidirectional long short-term memory configured with self-attention to further handle the context and long-term dependencies. The model uses spectral and voice quality features extracted from the raw speech signals as input. The proposed model achieves comparable performance in terms of F1 score and accuracy. The proposed model's performance is also presented in terms of confusion matrices.

Original languageEnglish
Title of host publicationAPCC 2022 - 27th Asia-Pacific Conference on Communications
Subtitle of host publicationCreating Innovative Communication Technologies for Post-Pandemic Era
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages601-604
Number of pages4
ISBN (Electronic)9781665499279
DOIs
StatePublished - 2022
Event27th Asia-Pacific Conference on Communications, APCC 2022 - Jeju Island, Korea, Republic of
Duration: 19 Oct 202221 Oct 2022

Publication series

NameAPCC 2022 - 27th Asia-Pacific Conference on Communications: Creating Innovative Communication Technologies for Post-Pandemic Era

Conference

Conference27th Asia-Pacific Conference on Communications, APCC 2022
Country/TerritoryKorea, Republic of
CityJeju Island
Period19/10/2221/10/22

Keywords

  • context-aware emotion recognition
  • dilated convolution
  • multi-head attention

Fingerprint

Dive into the research topics of 'Speech Emotion Recognition using Context-Aware Dilated Convolution Network'. Together they form a unique fingerprint.

Cite this