Long short-term memory recurrent neural network-based acoustic model using connectionist temporal classification on a large-scale training corpus

Donghyun Lee, Minkyu Lim, Hosung Park, Yoseb Kang, Jeong Sik Park, Gil Jin Jang, Ji Hwan Kim

Research output: Contribution to journalArticlepeer-review

55 Scopus citations

Abstract

A Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model (GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model (HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate (WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.

Original languageEnglish
Article number8068761
Pages (from-to)23-31
Number of pages9
JournalChina Communications
Volume14
Issue number9
DOIs
StatePublished - Sep 2017

Keywords

  • acoustic model
  • connectionist temporal classification
  • large-scale training corpus
  • long short-term memory
  • recurrent neural network

Fingerprint

Dive into the research topics of 'Long short-term memory recurrent neural network-based acoustic model using connectionist temporal classification on a large-scale training corpus'. Together they form a unique fingerprint.

Cite this