Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

Han Gyu Kim, Gil Jin Jang, Yung Hwan Oh, Ho Jin Choi

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

In this paper, we propose speech/music pitch classification based on recurrent neural network (RNN) for monaural speech segregation from music interferences. The speech segregation methods in this paper exploit sub-band masking to construct segregation masks modulated by the estimated speech pitch. However, for speech signals mixed with music, speech pitch estimation becomes unreliable, as speech and music have similar harmonic structures. In order to remove the music interference effectively, we propose an RNN-based speech/music pitch classification. Our proposed method models the temporal trajectories of speech and music pitch values and determines an unknown continuous pitch sequence as belonging to either speech or music. Among various types of RNNs, we chose simple recurrent network, long short-term memory (LSTM), and bidirectional LSTM for pitch classification. The experimental results show that our proposed method significantly outperforms the baseline methods for speech–music mixtures without loss of segregation performance for speech-noise mixtures.

Original languageEnglish
Pages (from-to)8193-8213
Number of pages21
JournalJournal of Supercomputing
Volume76
Issue number10
DOIs
StatePublished - 1 Oct 2020

Keywords

  • Bidirectional long short-term memory
  • Long short-term memory
  • Pitch classification
  • Recurrent neural network
  • Speech pitch estimation
  • Speech segregation

Fingerprint

Dive into the research topics of 'Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation'. Together they form a unique fingerprint.

Cite this